Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps
Summary
TLDRThe video demonstrates how to leverage Hugging Face's platform and AI models to build your own apps. It walks through an end-to-end example of creating an image-to-audio converter app using Hugging Face's hosted models as well as downloading models locally. The app allows users to upload an image, extract text describing the image using computer vision, generate a short story based on that description using a language model, and convert the story to audio using a text-to-speech model. Overall, the video aims to showcase Hugging Face's 16,000+ models and motivate developers to tap into them to create their own AI apps.
Takeaways
- 😀 Hugging Face is a top AI company with over 16,000 GitHub stars and 200,000 models for text, image, speech tasks
- 🤗 Their platform allows you to easily find, test and deploy AI models without needing to download or host them yourself
- 📸 You can use their hosted APIs or Transformers library to implement image-to-text, text-to-speech etc locally
- 💡 Their Space allows you to showcase AI apps and explore those built by others to get inspiration
- 👩💻 I built an app to turn images into audio stories using multiple Hugging Face models chained together
- 🖼️ Firstly, I analyzed the image and got a text description with a image-to-text model
- 📜 Next, I generated a story from that description using GPT-3 via Anthropic's Claude
- 🗣️ Finally, I turned the story into speech with a text-to-speech model
- ✏️ The Streamlit UI allows a user to easily upload an image and get the generated audio back
- 🤳 This showcases an end-to-end pipeline combining multiple AI models to create a unique experience
Q & A
What is Hugging Face and why is it valuable to learn?
-Hugging Face is an AI company valued at over $2 billion. It hosts over 200,000 different AI models that are used by top tech companies like Google, Amazon, Microsoft and Meta. Learning to use Hugging Face allows you to leverage these powerful AI models in your own applications.
What are the three main components of the Hugging Face platform?
-The three main components are: 1) Models - where you can find all different types of AI models to use 2) Datasets - contains lots of datasets to train your own models 3) Spaces - allows you to easily deploy AI apps and explore apps built by others
How can you test Hugging Face models before using them?
-You can test Hugging Face models directly on their hosted platform without needing to set up anything locally. This allows you to identify the right models for your use case more easily.
What are two ways to use Hugging Face models in your applications?
-Two ways are: 1) Use their hosted Inference API which is easy but has rate limits 2) Download models locally using the Transformers library which gives you more control and customization.
What three Hugging Face models were used in the image to audio app example?
-The three Hugging Face models used were: 1) BLIP image-to-text 2) An open source large language model like GPT-3 to generate text 3) A text-to-speech model to create audio narration
How can you quickly test different Hugging Face models?
-You can quickly test different Hugging Face models using the 'Deploy' button which provides a hosted Inference API for free without needing to set up anything.
What Python libraries make it easy to use Hugging Face?
-The main Python libraries that make it easy to use Hugging Face are: 1) Transformers for loading models locally 2) Pipeline for downloading and managing models 3) Requests for calling hosted APIs easily
How could services like Relevance AI integrate with Hugging Face?
-Services like Relevance AI could build deep integrations with Hugging Face to allow developers to leverage different AI models directly with easy no-code interfaces.
What makes Hugging Face useful for training your own models?
-Hugging Face provides many datasets you can use to train your own custom models in areas like text-to-speech, image recognition, text generation etc. This saves time over sourcing datasets.
Where can you learn more about the capabilities of Hugging Face?
-You can learn more about Hugging Face model capabilities at huggingface.co/tasks. This provides detailed docs on supported tasks for different models.
Outlines
📽️ Introducing Hugging Face for building AI apps
The paragraph introduces Hugging Face, a top AI company valued over $2 billion, with over 16,000 GitHub followers. It explains why learning Hugging Face is important for building AI apps, as it provides easy access to over 200,000 AI models that are used by top tech companies. It then outlines the key capabilities of Hugging Face - models, datasets and spaces - that help discover, test and deploy AI models easily.
🤗 Building an image to audio story app with Hugging Face
This paragraph walks through building an AI app to turn images into audio stories using multiple Hugging Face models. It outlines the 3 key components - image to text, text generation with GPT-3, and text-to-speech. It then shows sample code to load each model and process an image, generate a story with GPT-3, convert text to speech, and create a Streamlit interface, demonstrating how all the models can be integrated to build a complete AI workflow.
Mindmap
Keywords
💡Hugging Face
💡Transformer Pipeline
💡GPT-3
💡Streamlit
💡Image-to-text
💡Text-to-speech
💡Inference API
💡Model Hub
💡Tasks
💡Relevance AI
Highlights
Hugging Face provides over 200,000 different AI models to use including image, text, speech models
You can test Hugging Face models hosted on their servers without needing to download or host models yourself
You can also run Hugging Face models locally using their Transformers library
The app built has 3 components - image to text, text generation, and text to speech
The Pipeline API loads Hugging Face models into memory on your local machine
The tasks that Transformers supports are available on Hugging Face's tasks page with tutorials
Hugging Face provides hosted inference APIs to test models quickly without setup
The app uploads an image, extracts text, generates a story, and converts it to speech
Relevance AI provides an image to text model and no-code UI to build AI apps fast
It would be great if Relevance AI builds deeper Hugging Face integration in future
Going through the Hugging Face tasks page helps learn its supported models
The demo app shows how to connect Hugging Face APIs to build an end-to-end AI app
The hosted inference APIs provide a fast way to test Hugging Face models
The Pipeline API helps run models locally by handling downloads and dependencies
Hope this helps build interesting AI apps using Hugging Face!
Transcripts
if you are building AI apps you have to
learn how to use hugging face it is one
of the top AI companies valued more than
2 billion dollars it has more than 16
000 followers on GitHub its product is
used by Google Amazon Microsoft and meta
with more than 200 000 different type of
AI models including image to text text
to speech below text to image and many
more that's why if you are building AI
apps you absolutely need to learn how to
use it and I'm going to show you how can
you use hugging face platform and build
it with other Public Library like
launching let's get to it in the show
hugging face is a place for you to
discover and share AI models so there
are three parts of the hugging phase
platform models datasets and space
firstly is models this place where you
can find all different sorts of models
to use for example if we are interested
in using image text I can select the
category on the left and then on the
right side it choose any of the popular
image to text model and once I get into
this page I on the left side they will
have some description about model and on
the right side it allows you to preview
and test the AI model directly on their
hosted version and this is why Hackney
face is so useful so without it you will
need to find the models download it to
your local machine or host somewhere and
then try to run it to know if it is the
right model for you but with hugging
face they are hosting on their own
machine and you can test it immediately
but for this image to text model I can
drag and drop image directly and see
what kind of results it will get if and
want to use it it allows you to easily
deploy this model on different servers
you can also use host API on hugging
face hub for free it is a bit slow and
they have rate limits but it's
definitely enough for you to run some
tests but on the other hand if you
prefer to run the models locally on your
own machine you can also use their
Transformers library and I will talk
about how do we do that very soon the
listen model on the outside they also
have data sets and this is where you can
find a lot of data sets that you can use
to train your own model for example if I
want to build my own voice model I can
filter down to text to speech and find a
specific language that I want to use
then you can click on any of them and
preview what are the data sets they have
unless you are training your own model
you probably won't use this data sets
too much and the last part is space
space is initially designed for people
to Showcase and share the AI apps that
they build so they allow you to deploy
the apps that you have been building
very easily on their own machine and
they provide free version too but on the
other side you can explore what other AI
apps that people are building and there
are a lot of very cool stuff you can
just click on them and start playing
with those apps and you can also learn
how do they build those things clicking
on this button it will show you all the
models are used to build these apps and
you can click on the files to say the
source code how are we going to use
those models on hugging face while we
are implementing Inland chain I'll take
you through a step-by-step example of
implementing such AI app where I can
upload the image and then it can
automatically turn it into an audio
story the man and woman sat on the couch
lost in silence he broke it I love you
she smiled and said I know as through
this example you will learn how to use a
few different hugging face AI models
let's get to it firstly let's think step
by step how we're going to implement
this this app will have three components
where first they need an image to text
model to let the machine understand what
is the scenario based on the photo and
then we will use large language model to
generate a short story and in the end
we'll use a text to speech model to
generate the audio story and to find the
right image to text model we can go to
hugging face and filter down the image
text models the one I will be using is
the one called blip you will need to
create a hugging face account and then
go to settings access tokens and create
an access token for land chain let's go
back to the visual studio and create a
DOT NV file where we restore all the
credentials an output hugging face Hub
API token once I save that let's import
a few libraries we'll reverse the
import.env and run this to be able to
access hugging face API token that we
store in the EMV file and then we will
import pipelines from Transformers so
pipeline will allow us to download the
hugging phase model into our local
machine and now we're ready to implement
the first one image into text model
we've created pipeline to load the AI
model firstly we were putting the task
here which is image of the text some of
you might be curious where do we get
this task name image to text from so
hugging phase Transformers Library
actually have a predefined list of tasks
and you can go to this URL
huggingface.com tasks to understand what
are the tasks that it supports and you
can click on any of them to get a more
detailed tutorial about how to use that
specific tasks and then we need to put
the model name you can get a model Name
by clicking on this using Transformers
and just copy paste this one and we will
run image to text pass on the URL of the
image file then we're going to print it
in our copy paste a photo in the root
folder for the testing purpose now let's
see what results we got we run python
app.py now gather these results a group
of people standing on a boat which is
very accurate description and I only
want to return the actual text here so I
will do choose the library of the first
item and generated tax and next is that
we want to use a large language model to
generate a short story based on this
scenario that we got from the image you
can use some open source model hugging
face as well but for me I do prefer use
GPT so use launching here so that's
adding the open AI API key here as well
and then let's import a few libraries
from Land chain and this is a function
we're going to run we will firstly
create a prompt template that lets GPT
to generate story and then we'll create
this lrm chain with GPD 3.5 turbo let's
try and now all we need to do is just
turn those tags into a speech using a
text to speech model and again we will
do the same thing go to the models page
find text-to-speech models and find the
most popular one but this time I want to
share another way you can use hugging
face model so you can click on this
deploy button and there should be option
called info difference API and this is
super easy and fast way for you to test
out the honeyface API for free so this
is what we're gonna do we're gonna use
this ring request Library going back
here import requests and then create a
function text to speech and at the top
I'm going to load the hugging face API
token so that I can pass on to the API
request and I were adding the API URL
here putting the header that passed on
my hugging face API token and then
create a message of inputs now just call
this API requests and for the model I'm
using the result it returns is a Flac
file which is one type of audio file so
I'm going to store it locally let's try
it oh sorry I forgot to add the OS
Library which allow us to get the API
token and let me try again okay here we
go we got this audio file generated the
group of people were standing on the
boat their eyes fixed on the horizon the
sample setting painting the sky with
brilliant shades of pink pink and orange
okay now you can see whole thing has
been working all we need to do now just
connecting everything together and give
the UI a layer with streamlit so our
import streamlit as ST which is the
library that allow us create a user
interface for python code and button I
will add this and also create a main
function which will be called when the
app is loaded so first I will reset the
page title and then it will give header
turn image into audio story I will put a
uploaded file equal to St dot file
uploader this will allow people to
upload a image file then if the file is
uploaded I will firstly try to save this
image and display the image by sd.image
then I will call in the function that we
created get the model to generate text
from the image we uploaded and then let
GPT generate a story based on the
scenario in the end our general audio
file from the stories and we're going to
display scenario and stories in here and
in the end we're gonna display the audio
file that we got and that's pretty much
it let's try to run this app by doing
streamlit round app.p1 okay so here we
can upload the image and we can see it
is running here if we open the terminal
we should be able to see what it's doing
now okay here you go you can see we
already generate scenario and little
story here if you click this play button
it should play the audio file as they
said together on the couch the man
stared intently at the woman he had
known for 40 years and every time he
looked at her he felt like he was seeing
her for the first time suddenly he
blurted I think I love you the woman
turned to him her eyes wide with
surprise and then with a smile that lit
up the room she said finally from that
moment on they knew they were meant to
be together okay I think of this pretty
dope use case so this is how you can use
hugging face models he quickly recap the
easiest way for you to use that would be
you can use inference API to use their
hosted version directly on the other
side you can also use pipeline to
download all those models on your local
machine if you want to learn more I
highly recommend you go to thehf.com
tasks to learn all the different type of
tasks if it supports as well as
different type of models it has one last
thing I want to touch base on is I
realized one of the low code AI chin
build a platform called relevance AI
they actually provide image to hacks
model out of box and I can just create
this image to speech app super quickly
with their local UI and got it Droid app
out of box in just like five minutes I
do hope they can build a deep
integration with hugging face where I
can just grab different type of AI
models directly but it's already a
pretty good start so highly recommend
alright hopefully you know how to use
hacking face now and start building some
super interesting AI apps if you found
this content useful Please Subscribe or
continue sharing all those AI
experiments I'm doing thank you and see
you next time
Weitere verwandte Videos ansehen
![](https://i.ytimg.com/vi/GyllRd2E6fg/hq720.jpg)
This new AI is powerful and uncensored… Let’s run it
![](https://i.ytimg.com/vi/z8T5FoKOnWI/hq720.jpg)
Intelligenza Artificiale, questo sito (che non puoi non conoscere) vuol fare la rivoluzione
![](https://i.ytimg.com/vi/t1x5ZTI-tq4/hq720.jpg)
Text to Image generation using Stable Diffusion || HuggingFace Tutorial Diffusers Library
![](https://i.ytimg.com/vi/t-aFl3xjn5w/hq720.jpg)
Explained simply: How does AI create art?
![](https://i.ytimg.com/vi/zUBXBeqHFkQ/hq720.jpg)
The Perfect Prompt Generator No One Knows About
![](https://i.ytimg.com/vi/y2J7EZUk_a0/hq720.jpg)
SDXL Local LORA Training Guide: Unlimited AI Images of Yourself
5.0 / 5 (0 votes)