Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps

AI Jason
10 Jun 202309:47

Summary

TLDRThe video demonstrates how to leverage Hugging Face's platform and AI models to build your own apps. It walks through an end-to-end example of creating an image-to-audio converter app using Hugging Face's hosted models as well as downloading models locally. The app allows users to upload an image, extract text describing the image using computer vision, generate a short story based on that description using a language model, and convert the story to audio using a text-to-speech model. Overall, the video aims to showcase Hugging Face's 16,000+ models and motivate developers to tap into them to create their own AI apps.

Takeaways

  • 😀 Hugging Face is a top AI company with over 16,000 GitHub stars and 200,000 models for text, image, speech tasks
  • 🤗 Their platform allows you to easily find, test and deploy AI models without needing to download or host them yourself
  • 📸 You can use their hosted APIs or Transformers library to implement image-to-text, text-to-speech etc locally
  • 💡 Their Space allows you to showcase AI apps and explore those built by others to get inspiration
  • 👩‍💻 I built an app to turn images into audio stories using multiple Hugging Face models chained together
  • 🖼️ Firstly, I analyzed the image and got a text description with a image-to-text model
  • 📜 Next, I generated a story from that description using GPT-3 via Anthropic's Claude
  • 🗣️ Finally, I turned the story into speech with a text-to-speech model
  • ✏️ The Streamlit UI allows a user to easily upload an image and get the generated audio back
  • 🤳 This showcases an end-to-end pipeline combining multiple AI models to create a unique experience

Q & A

  • What is Hugging Face and why is it valuable to learn?

    -Hugging Face is an AI company valued at over $2 billion. It hosts over 200,000 different AI models that are used by top tech companies like Google, Amazon, Microsoft and Meta. Learning to use Hugging Face allows you to leverage these powerful AI models in your own applications.

  • What are the three main components of the Hugging Face platform?

    -The three main components are: 1) Models - where you can find all different types of AI models to use 2) Datasets - contains lots of datasets to train your own models 3) Spaces - allows you to easily deploy AI apps and explore apps built by others

  • How can you test Hugging Face models before using them?

    -You can test Hugging Face models directly on their hosted platform without needing to set up anything locally. This allows you to identify the right models for your use case more easily.

  • What are two ways to use Hugging Face models in your applications?

    -Two ways are: 1) Use their hosted Inference API which is easy but has rate limits 2) Download models locally using the Transformers library which gives you more control and customization.

  • What three Hugging Face models were used in the image to audio app example?

    -The three Hugging Face models used were: 1) BLIP image-to-text 2) An open source large language model like GPT-3 to generate text 3) A text-to-speech model to create audio narration

  • How can you quickly test different Hugging Face models?

    -You can quickly test different Hugging Face models using the 'Deploy' button which provides a hosted Inference API for free without needing to set up anything.

  • What Python libraries make it easy to use Hugging Face?

    -The main Python libraries that make it easy to use Hugging Face are: 1) Transformers for loading models locally 2) Pipeline for downloading and managing models 3) Requests for calling hosted APIs easily

  • How could services like Relevance AI integrate with Hugging Face?

    -Services like Relevance AI could build deep integrations with Hugging Face to allow developers to leverage different AI models directly with easy no-code interfaces.

  • What makes Hugging Face useful for training your own models?

    -Hugging Face provides many datasets you can use to train your own custom models in areas like text-to-speech, image recognition, text generation etc. This saves time over sourcing datasets.

  • Where can you learn more about the capabilities of Hugging Face?

    -You can learn more about Hugging Face model capabilities at huggingface.co/tasks. This provides detailed docs on supported tasks for different models.

Outlines

00:00

📽️ Introducing Hugging Face for building AI apps

The paragraph introduces Hugging Face, a top AI company valued over $2 billion, with over 16,000 GitHub followers. It explains why learning Hugging Face is important for building AI apps, as it provides easy access to over 200,000 AI models that are used by top tech companies. It then outlines the key capabilities of Hugging Face - models, datasets and spaces - that help discover, test and deploy AI models easily.

05:01

🤗 Building an image to audio story app with Hugging Face

This paragraph walks through building an AI app to turn images into audio stories using multiple Hugging Face models. It outlines the 3 key components - image to text, text generation with GPT-3, and text-to-speech. It then shows sample code to load each model and process an image, generate a story with GPT-3, convert text to speech, and create a Streamlit interface, demonstrating how all the models can be integrated to build a complete AI workflow.

Mindmap

Keywords

💡Hugging Face

Hugging Face is an AI company that hosts a variety of NLP and computer vision models that can be accessed via API or downloaded locally. It is a key platform for building AI apps as mentioned in the video. Examples of models hosted on Hugging Face utilized in the video include the image-to-text model BLIP and the text-to-speech model Mash-up.

💡Transformer Pipeline

The Transformer pipeline is a concept from the Hugging Face Transformers library that allows models to be downloaded locally from Hugging Face model hub. The pipeline handles tasks like tokenization and inference. In the video it is used to download the BLIP image-to-text model.

💡GPT-3

GPT-3 refers to the Generative Pretrained Transformer 3 language model from Anthropic. It is used in the video to generate a short story text based on the image description extracted by BLIP. The video illustrates how Hugging Face models can be combined with other libraries like Claude.

💡Streamlit

Streamlit is an open-source app framework used to create the UI for the image to audio app in the video. It allows for quickly building and sharing data apps and ML models. The video shows how Streamlit can be used with Hugging Face to wrap models in an interactive web interface.

💡Image-to-text

Image-to-text refers to ML models that can generate a text description from visual inputs. The video utilizes the BLIP model from Hugging Face hub to implement this, extracting text of what's shown in the uploaded image. This output text then fuels the story generation.

💡Text-to-speech

Text-to-speech uses ML to synthesize human-like speech from input text. The video uses a text-to-speech model from Hugging Face to turn the generated story into an audio file that can be played. This is a key component in building the end-to-end image to audio pipeline.

💡Inference API

Hugging Face provides hosted Inference APIs that allow models to be called directly without needing to have them downloaded locally. The video illustrates using the text-to-speech inference API to generate speech audio through API calls rather than running a local model.

💡Model Hub

The Hugging Face Model Hub is a catalog of thousands of ready-to-use NLP and computer vision models that can be accessed programmatically or via inference APIs. The video demonstrates leveraging the Hub to find appropriate models like BLIP image-to-text for the app.

💡Tasks

Tasks refer to the different functions Hugging Face models are designed for like text classification, object detection etc. Understanding the tasks for a model helps pick the right one. The video references the tasks page on Hugging Face site to lookup supported tasks.

💡Relevance AI

Relevance AI is called out at the end as a no-code ML platform with image-to-text capabilities that could be an alternative to build this app. The video suggests integrating Relevance AI and Hugging Face could enable accessing models with low-code conveniences.

Highlights

Hugging Face provides over 200,000 different AI models to use including image, text, speech models

You can test Hugging Face models hosted on their servers without needing to download or host models yourself

You can also run Hugging Face models locally using their Transformers library

The app built has 3 components - image to text, text generation, and text to speech

The Pipeline API loads Hugging Face models into memory on your local machine

The tasks that Transformers supports are available on Hugging Face's tasks page with tutorials

Hugging Face provides hosted inference APIs to test models quickly without setup

The app uploads an image, extracts text, generates a story, and converts it to speech

Relevance AI provides an image to text model and no-code UI to build AI apps fast

It would be great if Relevance AI builds deeper Hugging Face integration in future

Going through the Hugging Face tasks page helps learn its supported models

The demo app shows how to connect Hugging Face APIs to build an end-to-end AI app

The hosted inference APIs provide a fast way to test Hugging Face models

The Pipeline API helps run models locally by handling downloads and dependencies

Hope this helps build interesting AI apps using Hugging Face!

Transcripts

play00:00

if you are building AI apps you have to

play00:02

learn how to use hugging face it is one

play00:04

of the top AI companies valued more than

play00:06

2 billion dollars it has more than 16

play00:08

000 followers on GitHub its product is

play00:10

used by Google Amazon Microsoft and meta

play00:13

with more than 200 000 different type of

play00:16

AI models including image to text text

play00:19

to speech below text to image and many

play00:22

more that's why if you are building AI

play00:24

apps you absolutely need to learn how to

play00:27

use it and I'm going to show you how can

play00:29

you use hugging face platform and build

play00:31

it with other Public Library like

play00:32

launching let's get to it in the show

play00:34

hugging face is a place for you to

play00:36

discover and share AI models so there

play00:39

are three parts of the hugging phase

play00:41

platform models datasets and space

play00:43

firstly is models this place where you

play00:46

can find all different sorts of models

play00:48

to use for example if we are interested

play00:50

in using image text I can select the

play00:53

category on the left and then on the

play00:55

right side it choose any of the popular

play00:56

image to text model and once I get into

play00:59

this page I on the left side they will

play01:00

have some description about model and on

play01:03

the right side it allows you to preview

play01:04

and test the AI model directly on their

play01:07

hosted version and this is why Hackney

play01:08

face is so useful so without it you will

play01:11

need to find the models download it to

play01:13

your local machine or host somewhere and

play01:15

then try to run it to know if it is the

play01:16

right model for you but with hugging

play01:18

face they are hosting on their own

play01:20

machine and you can test it immediately

play01:22

but for this image to text model I can

play01:25

drag and drop image directly and see

play01:27

what kind of results it will get if and

play01:29

want to use it it allows you to easily

play01:31

deploy this model on different servers

play01:33

you can also use host API on hugging

play01:35

face hub for free it is a bit slow and

play01:38

they have rate limits but it's

play01:39

definitely enough for you to run some

play01:40

tests but on the other hand if you

play01:42

prefer to run the models locally on your

play01:44

own machine you can also use their

play01:46

Transformers library and I will talk

play01:48

about how do we do that very soon the

play01:50

listen model on the outside they also

play01:52

have data sets and this is where you can

play01:54

find a lot of data sets that you can use

play01:55

to train your own model for example if I

play01:58

want to build my own voice model I can

play02:00

filter down to text to speech and find a

play02:03

specific language that I want to use

play02:04

then you can click on any of them and

play02:06

preview what are the data sets they have

play02:08

unless you are training your own model

play02:09

you probably won't use this data sets

play02:11

too much and the last part is space

play02:13

space is initially designed for people

play02:16

to Showcase and share the AI apps that

play02:18

they build so they allow you to deploy

play02:20

the apps that you have been building

play02:21

very easily on their own machine and

play02:24

they provide free version too but on the

play02:25

other side you can explore what other AI

play02:27

apps that people are building and there

play02:29

are a lot of very cool stuff you can

play02:30

just click on them and start playing

play02:32

with those apps and you can also learn

play02:33

how do they build those things clicking

play02:35

on this button it will show you all the

play02:37

models are used to build these apps and

play02:39

you can click on the files to say the

play02:40

source code how are we going to use

play02:42

those models on hugging face while we

play02:44

are implementing Inland chain I'll take

play02:47

you through a step-by-step example of

play02:49

implementing such AI app where I can

play02:51

upload the image and then it can

play02:53

automatically turn it into an audio

play02:54

story the man and woman sat on the couch

play02:57

lost in silence he broke it I love you

play03:00

she smiled and said I know as through

play03:02

this example you will learn how to use a

play03:05

few different hugging face AI models

play03:07

let's get to it firstly let's think step

play03:09

by step how we're going to implement

play03:11

this this app will have three components

play03:13

where first they need an image to text

play03:15

model to let the machine understand what

play03:17

is the scenario based on the photo and

play03:19

then we will use large language model to

play03:21

generate a short story and in the end

play03:24

we'll use a text to speech model to

play03:26

generate the audio story and to find the

play03:28

right image to text model we can go to

play03:30

hugging face and filter down the image

play03:33

text models the one I will be using is

play03:35

the one called blip you will need to

play03:38

create a hugging face account and then

play03:39

go to settings access tokens and create

play03:42

an access token for land chain let's go

play03:44

back to the visual studio and create a

play03:47

DOT NV file where we restore all the

play03:49

credentials an output hugging face Hub

play03:51

API token once I save that let's import

play03:53

a few libraries we'll reverse the

play03:55

import.env and run this to be able to

play03:58

access hugging face API token that we

play04:00

store in the EMV file and then we will

play04:02

import pipelines from Transformers so

play04:04

pipeline will allow us to download the

play04:06

hugging phase model into our local

play04:08

machine and now we're ready to implement

play04:09

the first one image into text model

play04:11

we've created pipeline to load the AI

play04:13

model firstly we were putting the task

play04:15

here which is image of the text some of

play04:18

you might be curious where do we get

play04:20

this task name image to text from so

play04:23

hugging phase Transformers Library

play04:24

actually have a predefined list of tasks

play04:27

and you can go to this URL

play04:28

huggingface.com tasks to understand what

play04:32

are the tasks that it supports and you

play04:34

can click on any of them to get a more

play04:36

detailed tutorial about how to use that

play04:38

specific tasks and then we need to put

play04:39

the model name you can get a model Name

play04:41

by clicking on this using Transformers

play04:44

and just copy paste this one and we will

play04:46

run image to text pass on the URL of the

play04:49

image file then we're going to print it

play04:51

in our copy paste a photo in the root

play04:53

folder for the testing purpose now let's

play04:55

see what results we got we run python

play04:57

app.py now gather these results a group

play05:00

of people standing on a boat which is

play05:02

very accurate description and I only

play05:04

want to return the actual text here so I

play05:08

will do choose the library of the first

play05:09

item and generated tax and next is that

play05:13

we want to use a large language model to

play05:15

generate a short story based on this

play05:17

scenario that we got from the image you

play05:19

can use some open source model hugging

play05:21

face as well but for me I do prefer use

play05:23

GPT so use launching here so that's

play05:26

adding the open AI API key here as well

play05:29

and then let's import a few libraries

play05:31

from Land chain and this is a function

play05:32

we're going to run we will firstly

play05:34

create a prompt template that lets GPT

play05:36

to generate story and then we'll create

play05:38

this lrm chain with GPD 3.5 turbo let's

play05:42

try and now all we need to do is just

play05:44

turn those tags into a speech using a

play05:46

text to speech model and again we will

play05:48

do the same thing go to the models page

play05:50

find text-to-speech models and find the

play05:52

most popular one but this time I want to

play05:54

share another way you can use hugging

play05:56

face model so you can click on this

play05:57

deploy button and there should be option

play05:59

called info difference API and this is

play06:01

super easy and fast way for you to test

play06:04

out the honeyface API for free so this

play06:06

is what we're gonna do we're gonna use

play06:08

this ring request Library going back

play06:10

here import requests and then create a

play06:12

function text to speech and at the top

play06:15

I'm going to load the hugging face API

play06:18

token so that I can pass on to the API

play06:20

request and I were adding the API URL

play06:23

here putting the header that passed on

play06:25

my hugging face API token and then

play06:27

create a message of inputs now just call

play06:29

this API requests and for the model I'm

play06:31

using the result it returns is a Flac

play06:34

file which is one type of audio file so

play06:37

I'm going to store it locally let's try

play06:39

it oh sorry I forgot to add the OS

play06:42

Library which allow us to get the API

play06:44

token and let me try again okay here we

play06:47

go we got this audio file generated the

play06:49

group of people were standing on the

play06:51

boat their eyes fixed on the horizon the

play06:54

sample setting painting the sky with

play06:56

brilliant shades of pink pink and orange

play06:58

okay now you can see whole thing has

play07:01

been working all we need to do now just

play07:02

connecting everything together and give

play07:04

the UI a layer with streamlit so our

play07:07

import streamlit as ST which is the

play07:10

library that allow us create a user

play07:12

interface for python code and button I

play07:14

will add this and also create a main

play07:16

function which will be called when the

play07:18

app is loaded so first I will reset the

play07:20

page title and then it will give header

play07:22

turn image into audio story I will put a

play07:24

uploaded file equal to St dot file

play07:27

uploader this will allow people to

play07:29

upload a image file then if the file is

play07:31

uploaded I will firstly try to save this

play07:33

image and display the image by sd.image

play07:36

then I will call in the function that we

play07:38

created get the model to generate text

play07:40

from the image we uploaded and then let

play07:42

GPT generate a story based on the

play07:45

scenario in the end our general audio

play07:46

file from the stories and we're going to

play07:48

display scenario and stories in here and

play07:51

in the end we're gonna display the audio

play07:53

file that we got and that's pretty much

play07:54

it let's try to run this app by doing

play07:57

streamlit round app.p1 okay so here we

play08:02

can upload the image and we can see it

play08:04

is running here if we open the terminal

play08:06

we should be able to see what it's doing

play08:08

now okay here you go you can see we

play08:11

already generate scenario and little

play08:13

story here if you click this play button

play08:15

it should play the audio file as they

play08:17

said together on the couch the man

play08:19

stared intently at the woman he had

play08:21

known for 40 years and every time he

play08:24

looked at her he felt like he was seeing

play08:26

her for the first time suddenly he

play08:29

blurted I think I love you the woman

play08:31

turned to him her eyes wide with

play08:34

surprise and then with a smile that lit

play08:37

up the room she said finally from that

play08:40

moment on they knew they were meant to

play08:42

be together okay I think of this pretty

play08:44

dope use case so this is how you can use

play08:47

hugging face models he quickly recap the

play08:50

easiest way for you to use that would be

play08:51

you can use inference API to use their

play08:54

hosted version directly on the other

play08:56

side you can also use pipeline to

play08:58

download all those models on your local

play09:00

machine if you want to learn more I

play09:01

highly recommend you go to thehf.com

play09:04

tasks to learn all the different type of

play09:07

tasks if it supports as well as

play09:09

different type of models it has one last

play09:11

thing I want to touch base on is I

play09:13

realized one of the low code AI chin

play09:15

build a platform called relevance AI

play09:16

they actually provide image to hacks

play09:18

model out of box and I can just create

play09:20

this image to speech app super quickly

play09:22

with their local UI and got it Droid app

play09:24

out of box in just like five minutes I

play09:27

do hope they can build a deep

play09:28

integration with hugging face where I

play09:29

can just grab different type of AI

play09:31

models directly but it's already a

play09:33

pretty good start so highly recommend

play09:35

alright hopefully you know how to use

play09:36

hacking face now and start building some

play09:38

super interesting AI apps if you found

play09:41

this content useful Please Subscribe or

play09:43

continue sharing all those AI

play09:44

experiments I'm doing thank you and see

play09:46

you next time