Deploy Hugging Face models on Google Cloud: from the hub to Inference Endpoints
Summary
TLDRJulian from Hugging Face introduces a series of videos on deploying Hugging Face models on Google Cloud. The first video demonstrates using Hugging Face's Inference Endpoints to deploy models like the new Gemma model with a single click. It guides viewers through accessing the model, deploying it on Google Cloud, and testing it using a playground and API. Julian also highlights the ease of deleting the endpoint to stop charges, promising more deployment methods in upcoming videos.
Takeaways
- 🚀 Julian introduces a series of videos on deploying Hugging Face models on Google Cloud.
- 🤝 Hugging Face has announced a partnership with Google Cloud.
- 📹 The video will demonstrate deploying models using Hugging Face's own service, Inference Endpoints.
- 🔗 The deployment process will be shown for one-click models from The Hub to Google Cloud.
- 🌐 The video mentions the availability of a single US region for deployment but hints at the addition of more regions.
- 🛡️ The deployment includes options for security levels: public (not recommended) and protected with token authentication.
- 🔄 The video discusses the use of TGI serving container, which has reverted to the Apache 2 license.
- 💻 The script guides viewers on how to select deployment settings like autoscaling and model revision.
- 🛑 The importance of deleting endpoints after testing to avoid charges is highlighted.
- 📈 The video showcases the ease of deployment with a simple click and code copy-paste for testing.
- 🔍 The script includes a practical example of deploying and testing the 'Gemma' model from Google.
- 🗓️ More videos on different ways to deploy Hugging Face models on Google Cloud are promised for the future.
Q & A
Who is the speaker in the video?
-The speaker is Julian from Hugging Face.
What is the main topic of the video?
-The main topic is deploying Hugging Face models on Google Cloud using Inference endpoints.
What is the partnership announced in the video?
-The partnership announced is between Hugging Face and Google Cloud.
What is the name of Hugging Face's own deployment service mentioned in the video?
-The deployment service is called Inference Endpoints.
How many videos does Julian plan to make about deploying models on Google Cloud?
-Julian plans to make several videos, with at least three mentioned.
What is the first model Julian decides to deploy on Google Cloud?
-Julian decides to deploy the new version of the GEMMA model from Google.
What is the license under which TGI serving container is now available?
-TGI serving container is now available under the Apache 2 license.
What are the security levels available for deploying models on Google Cloud as mentioned in the video?
-The security levels mentioned are public and protected. There is no private option available at the moment.
How can viewers test the deployed model using the video's instructions?
-Viewers can test the deployed model using the playground or by using the API with a token for protected security.
What should viewers do after they finish testing the deployed model?
-After testing, viewers should delete the endpoint by going to settings, typing or pasting the endpoint name, and clicking delete to avoid further charges.
What does Julian suggest at the end of the video for viewers to do?
-Julian suggests that viewers keep an eye out for the next two videos where he will show more ways to deploy Hugging Face models on Google Cloud.
Outlines
🚀 Deploying Hugging Face Models on Google Cloud
Julian from Hugging Face introduces a new partnership with Google Cloud and demonstrates how to deploy Hugging Face models on Google Cloud. He plans to create a series of videos to showcase different deployment methods. In this first video, he focuses on using Hugging Face's own deployment service called Inference Endpoints to deploy models from The Hub to Google Cloud with ease. Julian guides viewers through the process of deploying the Gemma model, a new version from Google, by accessing it from The Hub, requesting access if necessary, and then selecting Google Cloud as the deployment target. He explains the configuration options, including the serving container, security levels, and the choice between public and protected access. Julian also mentions the recent change of Hugging Face's Transformers library (TGI) back to the Apache 2 license, which is beneficial for the community. The video pauses as the deployment process begins, and Julian promises to test the endpoint once it's ready.
🔧 Testing and Managing Deployed Models on Google Cloud
In the second part of the video, Julian tests the deployed Gemma model on Google Cloud using the playground feature, which requires a token for authentication due to the 'protected' security level. He demonstrates how to use the API to generate text based on a given prompt and shows the output, which is more interesting content than the initial Starbucks example. Julian emphasizes the simplicity and efficiency of the deployment process, which allows for quick testing and experimentation. After testing, he guides viewers on how to delete the endpoint to avoid further charges, by navigating to the settings and confirming the deletion. He concludes by reminding viewers that there are more videos coming that will explore additional methods for deploying Hugging Face models on Google Cloud.
Mindmap
Keywords
💡Hugging Face
💡Google Cloud
💡Inference Endpoints
💡Model Deployment
💡Gated Model
💡TGI Serving Container
💡Apache 2 License
💡Security Level
💡Autoscaling
💡Quantization
💡Playground
💡API
Highlights
Introduction to a partnership with Google Cloud for deploying Hugging Face models.
Announcement of several upcoming videos demonstrating deployment methods on Google Cloud.
Introduction of Hugging Face's own deployment service called Inference Endpoints.
Demonstration of deploying one-click models from The Hub to Google Cloud.
Request for viewers to subscribe and enable notifications for future updates.
Explanation of accessing a gated model by entering an email for access.
Encouragement to read about the model and test it locally before deployment.
Step-by-step guide on deploying a model on Inference Endpoints with Google Cloud.
Mention of the deployment using Hugging Face's TGI serving container.
Announcement that TGI is now back to the Apache 2 license.
Discussion on choosing the security level for the deployment: public or protected.
Overview of configuration options for deployment, including autoscaling and model revision.
Simple process of selecting Google Cloud and security level to initiate deployment.
Pause in the video to allow the GCP instance to launch and prepare the endpoint.
Testing the deployed endpoint using a playground with a token for protected security.
Demonstration of changing generation parameters and using the API for testing.
Example of invoking the endpoint and printing the output in a notebook.
Instructions on how to delete the endpoint to stop charges after testing.
Teaser for two more videos showing additional ways to deploy models on Google Cloud.
Transcripts
hi everybody this is Julian from hogging
face as you can see I'm on the road
right now but that's not an excuse not
to do any
videos as you probably know we've
recently announced a partnership with
Google cloud and in this video and in
the following videos I will show you how
you can quickly and easily deploy
hugging face models on Google cloud and
there are different ways to do this
that's why I'm going to do several
videos in the first one I'm going to
show you how to use our own deployment
service called inference end points and
we'll see how we can deploy oneclick
models from The Hub to Google Cloud as
simple as that let's get started if you
enjoy this video please give it a thumbs
up and consider subscribing to my
YouTube channel and if you do please
don't forget to enable notifications so
that you won't miss anything in the
future also why not share this video on
your social networks or with your
colleagues because if you enjoyed it
it's very likely someone else will thank
you very much for your support starting
from the Hub let's find a good model to
deploy on Google Cloud so how about we
try Gemma this new version of the Gemma
model from Google so let's just click on
this if this is the first time you open
this model page you'll have to ask for
access this is a gated model but just uh
enter your email and confirm and you
should have access in seconds okay so
don't let that stop
you as always I would encourage you to
read about the model um why not maybe
tested locally etc etc right lots of
good information
there but for now we want to deploy it
on the inference endpoints so let's just
click on deploy inference
endpoints and you can see we have a new
option for Google Cloud right next to
AWS and isure so why don't we select
Google uh at the moment we have a single
us region but um pretty sure we will add
more and um we automatically select what
we think is the best
configuration for this model so here
we're going to deploy on this particular
instance okay and as you can see uh we
are deploying with our uh
TGI um serving container and by the way
just I think yesterday we announced that
TGI is now back to the Apachi 2 license
which I think is good news for
everyone um we can
decide what the security level should be
so remember public means public right so
wide open to the public internet uh no
authentication I wouldn't recommend it
and protected uh means accessible from
the internet with
token authentication right uh we don't
have a private option for now which we
have on on other clouds so let's go with
protected we could always take a look at
the configuration do we want
autoscaling do we want a particular
revision of the
model I guess we'll go with a
TGI we could enable quantization if we
wanted etc etc but I will stick with all
those defaults okay so very simple just
select Google um and that's pretty much
it yeah and the security level of course
okay let's click on create
endpoint and so now it will take a few
minutes um of course we'll launch
automatically this uh uh gcp instance in
our own account and uh prepare the
endpoint Etc so I'll pause the video and
wait for the endpoint to come up and of
course we'll test it afterwards after a
few minutes the end point is up I can
see it says running here and well why
don't we test
it
so we could test it with the playground
we just need to select a token obviously
because uh we're using the protected
security right so
um let's just try
this
challenging
question trust
me all right let's see what that
says all right what did I tell you
Starbucks horrible coffee so hopefully
there's something more interesting in
Seattle than Starbucks anyway uh
playground let's try the uh the
API um and so again I need my token I
could change some of the generation
parameters I want it to uh to increase
temperature etc
etc and let's just include the token
don't worry I will invalidate it
afterwards and I just need to copy this
okay and let's just switch to notebook
okay so let's just paste the code maybe
we'll change the
question let's try that
again okay let's just run
this invoking the end point passing the
token
okay um and I guess we need to print the
output okay let's
just pre print the
output all right the Seattle Fire of
19907 was one of the more destructive in
American history okay well that's
clearly more interesting than Starbucks
okay so as you can see super super nice
and simple um just one click to deploy
and then copy paste the code and you can
test in minutes okay so when you're done
don't forget to delete the Endo let me
show you so when you're done testing
just go to
settings scroll all the way
down you need to type or paste the
endpoint name click on delete and it
goes away and your stop being charged
right perfect okay so that's the first
way to deploy huging face models on
Google Cloud using inference handpoint I
hope this was interesting I've got two
more ways to show you so keep an eye out
for the next two videos okay keep
rocking
Посмотреть больше похожих видео
End of local AI Apps ?? HuggingFace Spaces + Docker + FastAPI
Hands-On Hugging Face Tutorial | Transformers, AI Pipeline, Fine Tuning LLM, GPT, Sentiment Analysis
Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps
Is it really the best 7B model? (A First Look)
YOLOv8: How to Train for Object Detection on a Custom Dataset
Intelligenza Artificiale, questo sito (che non puoi non conoscere) vuol fare la rivoluzione
5.0 / 5 (0 votes)