Deploy Hugging Face models on Google Cloud: from the hub to Inference Endpoints

Julien Simon

9 Apr 202407:13

Summary

TLDRJulian from Hugging Face introduces a series of videos on deploying Hugging Face models on Google Cloud. The first video demonstrates using Hugging Face's Inference Endpoints to deploy models like the new Gemma model with a single click. It guides viewers through accessing the model, deploying it on Google Cloud, and testing it using a playground and API. Julian also highlights the ease of deleting the endpoint to stop charges, promising more deployment methods in upcoming videos.

Takeaways

🚀 Julian introduces a series of videos on deploying Hugging Face models on Google Cloud.
🤝 Hugging Face has announced a partnership with Google Cloud.
📹 The video will demonstrate deploying models using Hugging Face's own service, Inference Endpoints.
🔗 The deployment process will be shown for one-click models from The Hub to Google Cloud.
🌐 The video mentions the availability of a single US region for deployment but hints at the addition of more regions.
🛡️ The deployment includes options for security levels: public (not recommended) and protected with token authentication.
🔄 The video discusses the use of TGI serving container, which has reverted to the Apache 2 license.
💻 The script guides viewers on how to select deployment settings like autoscaling and model revision.
🛑 The importance of deleting endpoints after testing to avoid charges is highlighted.
📈 The video showcases the ease of deployment with a simple click and code copy-paste for testing.
🔍 The script includes a practical example of deploying and testing the 'Gemma' model from Google.
🗓️ More videos on different ways to deploy Hugging Face models on Google Cloud are promised for the future.

Q & A

Who is the speaker in the video?
-The speaker is Julian from Hugging Face.
What is the main topic of the video?
-The main topic is deploying Hugging Face models on Google Cloud using Inference endpoints.
What is the partnership announced in the video?
-The partnership announced is between Hugging Face and Google Cloud.
What is the name of Hugging Face's own deployment service mentioned in the video?
-The deployment service is called Inference Endpoints.
How many videos does Julian plan to make about deploying models on Google Cloud?
-Julian plans to make several videos, with at least three mentioned.
What is the first model Julian decides to deploy on Google Cloud?
-Julian decides to deploy the new version of the GEMMA model from Google.
What is the license under which TGI serving container is now available?
-TGI serving container is now available under the Apache 2 license.
What are the security levels available for deploying models on Google Cloud as mentioned in the video?
-The security levels mentioned are public and protected. There is no private option available at the moment.
How can viewers test the deployed model using the video's instructions?
-Viewers can test the deployed model using the playground or by using the API with a token for protected security.
What should viewers do after they finish testing the deployed model?
-After testing, viewers should delete the endpoint by going to settings, typing or pasting the endpoint name, and clicking delete to avoid further charges.
What does Julian suggest at the end of the video for viewers to do?
-Julian suggests that viewers keep an eye out for the next two videos where he will show more ways to deploy Hugging Face models on Google Cloud.