The Era of 1-bit LLMs-All Large Language Models are in 1.58 Bits

Krish Naik
29 Feb 202417:05

Summary

TLDRThe video introduces the concept of a 1-bit large language model called BitNet that uses only -1, 0 or 1 as weight values, instead of 32-bit or 16-bit values typically used. This simplifies computations, reduces memory and power needs, while maintaining performance. The video explains how the quantization formula works to convert full-precision weights to 1-bit values. It highlights BitNet's advantages like improved feature filtering and matching baseline model performance. Comparative analysis shows BitNet requires lower memory and has lower latency than regular LLMs like LaMDA, especially for larger model sizes, making 1-bit LLMs promising for cost-effective and broad deployment.

Takeaways

  • ๐Ÿ˜Š Introducing BitNet, a new 1-bit LLM model that matches performance of 32-bit models while being more efficient
  • ๐Ÿ˜ฎ BitNet uses ternary weights of just -1, 0 or 1 to replace full precision weights
  • ๐Ÿ‘ This simplifies computations to just integer additions, reducing memory and energy needs
  • ๐Ÿ”Œ Can enable LLMs to run on low-resource devices while maintaining perplexity
  • โšก Drastically reduces latency, memory usage and energy consumption for inference
  • ๐Ÿ”ฌ Uses a quantization function called Absolute Mean Quantization to convert weights
  • ๐Ÿ˜€ Replaces nn.linear with bitlinear for training 1-bit weights and 8-bit activations
  • ๐Ÿ“ˆ Matches performance of baseline LLMs like LLaMA in terms of perplexity
  • ๐Ÿ’ก Explicitly supports feature filtering via 0 weights to improve 1-bit LLM performance
  • ๐Ÿ”œ This architecture calls for new hardware optimizations to fully utilize 1-bit LLMs

Q & A

  • What is a one bit LLN model?

    -A one bit LLN model is a large language model where every parameter or weight is ternary, meaning it has only three possible values: -1, 0 or 1. This allows the model to match the performance of full precision models while being more cost effective in terms of memory, throughput and energy consumption.

  • How does the one bit LLN model save computation resources?

    -The one bit LLN model saves computation resources because the weights are restricted to only -1, 0 or 1. This means multiplication is not needed during matrix multiplications, only integer addition is required which saves significant GPU resources.

  • What is the quantization function used to convert weights to ternary values?

    -The quantization function used is called the absolute mean quantization function. It applies a formula to convert the floating point weights into one of three possible ternary values: -1, 0 or 1.

  • What are the two main advantages of the one bit LLN model?

    -The two main advantages are: 1) Stronger modeling capacity due to explicit support for feature filtering made possible by the zero weights, and 2) Matching full precision model performance in terms of end-to-end task accuracy, starting from a 3B parameter size.

  • How does the one bit LLN model memory usage compare to the vanilla LLama?

    -Experiments show that the one bit LLN model uses significantly less memory than the vanilla LLama model. For example, a 7B parameter LLama requires 20.8GB RAM versus only 8.96GB for the equivalent one bit model.

  • What hardware optimizations are suggested for the one bit LLN model?

    -The paper calls for new hardware optimizations to take advantage of the computation savings from the one bit architecture, such as more efficient integer arithmetic units specialized for this model structure.

  • How is the one bit LLN model beneficial for deployments?

    -The one bit LLN model allows large language models to be deployed even with limited resources. Its lower memory footprint and computational requirements make it viable to deploy on resource constrained devices.

  • What is perplexity in the context of this research?

    -Perplexity measures how well an LLN model predicts sample text. The experiments showed the one bit LLN matched vanilla models in terms of perplexity, indicating its ability to model language is equivalent.

  • What is the BitLinear layer in the one bit architecture?

    -BitLinear replaces the standard linear layer in the Transformer architecture. It is specialized to work with the 1.5 bit weights and 8-bit activations used during training of the model.

  • How might the one bit architecture impact the accessibility of LLMs?

    -The drastic efficiency improvements may allow very large LLMs to run on common consumer devices, greatly improving public access and enabling more widespread applications.

Outlines

00:00

๐ŸŽค Introducing One Bit LLMs

The narrator introduces the concept of One Bit LLMs, which use only -1, 0 or 1 as model weights instead of 32 or 16 bits. This allows simplified math operations, reducing compute requirements while maintaining performance. The specific model discussed is called BitNet.

05:01

๐Ÿ“Š Comparing BitNet to Regular LLMs

BitNet matches regular full precision LLMs in perplexity and task performance with significantly lower memory, latency and energy needs. This is because BitNet only requires integer addition instead of more complex float math operations.

10:03

๐Ÿ”ข How BitNet Works

BitNet uses a quantization function called Absolute Mean Quantization to convert regular model weights to -1, 0 or 1. This enables skipping multiplication operations and only requiring addition. BitNet also replaces nn.linear with bitlinear for 1 bit weights and 8 bit activations.

15:05

๐Ÿ“ˆ BitNet Performance Statistics

Quantitative results show BitNet reduces model size and latency significantly compared to baseline LLMs like LAMA while maintaining competitive perplexity.

Mindmap

Keywords

๐Ÿ’กLlm models

Llm stands for language learning models. These are AI models trained on large amounts of text data to generate or summarize text. The video discusses recent advancements in llm models like 1 bit llm models which are more efficient.

๐Ÿ’กQuantization

It is the process of converting floating point values of model weights to low precision integer values like int8 or int4. This reduces model size and improves efficiency. The video talks about quantizing weights to 1 bit values.

๐Ÿ’กBitnet

Bitnet is the name of the 1 bit llm model architecture introduced in the paper. It uses ternary values of -1, 0 and 1 for its weights instead of high precision floating points.

๐Ÿ’กTernary values

Ternary means having three possible values. Bitnet uses weights values of either -1, 0 or 1 instead of floating point values. This is more efficient since it replaces multiply with add.

๐Ÿ’กPareto improvement

Replacing floating point weights with 1 bit improves efficiency and reduces cost without losing performance. This concept of improving one factor without hurting others is called Pareto improvement.

๐Ÿ’กPerplexity

It is a common metric to evaluate language models. Lower perplexity indicates better predictive performance. The video states Bitnet matches baseline perplexity.

๐Ÿ’กFeature filtering

Zero weights can filter unnecessary features leading to better performance. Bitnet supports this by having 0 as one of the weight values.

๐Ÿ’กAbsolute mean quantization

It is the quantization function used to convert floating point weights to 1 bit ternary values of -1, 0 and 1. The video shows the formula.

๐Ÿ’กBit linear

Instead of standard linear layer, Bitnet uses a bit linear layer with 1 bit weights and 8 bit activations. This maintains performance while being efficient.

๐Ÿ’กMemory usage

Key benefit shown is Bitnet reduces memory usage compared to baseline transformer model while maintaining competitive performance.

Highlights

Introducing a one-bit LLM variant called BitNet where every parameter is ternary (-1, 0, or 1)

BitNet matches the performance of full precision Transformers in perplexity and end-to-end task performance

BitNet is significantly more cost-effective in latent memory throughput and energy consumption

Using ternary values allows skipping multiplication operations, requiring only addition for forward/backward propagation

Skipping multiplication operations reduces GPU requirements for fine-tuning and training

BitNet provides solutions to reduce inferencing cost, latency, throughput and energy of LLMs

Calls for new hardware optimizations specifically for 1-bit LLMs

BitNet includes 0 values which allow explicit feature filtering to improve 1-bit LLM performance

Energy savings from BitNet can be translated into faster computation

BitNet trains from scratch with 1.58-bit weights and 8-bit activations

BitNet matches full precision baseline in end-to-end task performance with only 3B parameters

BitNet reduces memory consumption and inferencing latency

Huge difference in model size and latency between BitNet and standard LLMs

Weights are converted to ternary values using an absolute mean quantization formula

Replaces nn.linear with bit_linear for training 1-bit models

Transcripts

play00:00

hello all my name is krishak and welcome

play00:02

to my YouTube channel so guys uh one of

play00:05

the most interesting thing in the field

play00:07

of data science or generative AI is that

play00:09

the kind of research that is currently

play00:11

happening right every day you'll be

play00:13

seeing some new things that are actually

play00:14

happening which is very much beneficial

play00:16

for the entire Community who are working

play00:18

with llm models uh specifically today I

play00:21

saw this amazing research paper where it

play00:24

is written as era of 1 bit llm so I'll

play00:26

be going to talk about this particular

play00:28

research paper and what exactly one bit

play00:30

llm is and how it is far more

play00:32

advantageous when compared to those

play00:34

32bit or 16bit llms models okay so

play00:37

everything I'll be discussing about one

play00:39

important thing that I also want to make

play00:41

sure that you learn from this particular

play00:42

video is that how do you read a research

play00:45

paper what are the important points that

play00:47

you should definitely highlight while

play00:49

reading a research paper and how you

play00:51

should definit and one thing is that you

play00:53

cannot directly understand just by

play00:54

reading it you really need to have some

play00:56

basic knowledge and without that

play00:58

particular basic knowledge it will be

play00:59

very diff difficult to understand so if

play01:01

you're following my tutorials I always

play01:03

make sure that whenever I make my videos

play01:05

right I definitely watch or see all the

play01:08

research papers and then with respect to

play01:10

that I simplify the those Concepts and

play01:12

try to explain it to you so let's go

play01:14

ahead and understand about this one bit

play01:16

llm now guys uh if you remember in my

play01:18

previous video we have already discussed

play01:20

about quation right so quation was

play01:24

covered now with respect to quantisation

play01:26

what we were doing is that let's say I

play01:28

have a model which is called as Lama 2

play01:30

which is an open source model let's say

play01:32

this model is 7 billion having 7 billion

play01:35

parameters when we say 7 billion

play01:37

parameters I'm talking about weights

play01:39

okay now obviously if I have a system

play01:42

where I don't have very high

play01:43

configuration not I have resource

play01:45

constraint I have limited amount of Ram

play01:47

or gpus what we specifically do we

play01:49

perform quantisation and we convert this

play01:52

Lama 2 model which is probably in FP

play01:55

32bit and we try to convert this into

play01:57

int 8bit okay

play02:00

int 8 which is nothing but 8 Bits right

play02:03

now when we are once we are doing this

play02:05

specific process what is basically

play02:06

happening is that the model size is

play02:08

getting decreased right and because of

play02:10

that we will be able to load it and

play02:11

we'll be able to perform any task along

play02:13

with this we can also perform fine

play02:15

tuning with the help of Laura and CLA

play02:18

right so I hope you know this Laura and

play02:20

CLA I've already discussed in my

play02:22

previous video please just go click on

play02:24

my uh click on my channel otherwise just

play02:27

go ahead and see in the description I've

play02:28

been providing that particular links

play02:30

with respect to fine tuning now with the

play02:32

help of LA and cl we can perform the

play02:34

fine tuning okay now the question is

play02:36

that what is this one bit llm right as I

play02:40

said that with the help of quantisation

play02:41

we will try to convert this into 32 to

play02:44

16 bit or it can be 8 bit right but

play02:46

converting this into a one bit that can

play02:48

be again uh if you're trying if you now

play02:51

just by seeing this right if you are

play02:53

able to convert this into one bit that

play02:55

basically means we will never be having

play02:57

any resource constraint right resource

play03:00

constraint yes with limited Ram with

play03:02

limited GPU with limited storage we can

play03:05

probably perform everything from fine

play03:07

tuning to inferencing right so

play03:09

inferencing can also be performed right

play03:12

and this is what is so amazing about

play03:14

this and this is I I don't know like

play03:16

what is going to happen just in some

play03:18

days because once this is probably gone

play03:20

right now we just have the research

play03:22

paper once this implementation gets

play03:23

started trust me it will be quite

play03:26

amazing for the entire Community who are

play03:27

working with llm models okay so this was

play03:30

just a brief idea about this one now

play03:33

let's go ahead and discuss what is 1 bit

play03:36

llm okay and when we say to be precise

play03:40

when we say that all large language

play03:41

models it is basically in 1.58 bits okay

play03:44

why it is 1.58 we'll discuss about it

play03:46

and there are many points that needs to

play03:48

be discussed uh along with me please

play03:50

make sure that you watch this video till

play03:51

the end because I'm going to read over

play03:53

here because this will also give you an

play03:55

idea that how you should probably go

play03:56

ahead and read the research paper so let

play03:58

me quickly uh go ahead and clear this

play04:02

let's see whether it'll getting cleared

play04:04

or not okay so over here okay clear is

play04:09

basically

play04:10

happening um okay I will just rub it

play04:13

okay now let's go ahead and discuss

play04:16

about this and let's read some of the

play04:18

important information that is present

play04:20

over here okay and trust me guys read

play04:22

along with me then only you'll be able

play04:24

to understand how you can read the

play04:26

research paper okay now what exactly

play04:29

this one bit llm model is um in this

play04:31

work we introduce a onebit llm variant

play04:35

namely bit net okay so bit net is the

play04:39

llm model name one bit llm model name

play04:42

and then where every single parameter or

play04:44

weight of the llm is Turner right now it

play04:47

is not floating 62 bit or sorry 32bit or

play04:51

16 bit it is Turner Turner basically

play04:53

means it has only three values it can

play04:55

have only three values weights it can be

play04:58

minus1 0 or 1 one okay it matches the

play05:01

full Precision Transformer M with the

play05:03

same model size and training tokens in

play05:05

terms of perplexity perplexity basically

play05:08

means so with respect to any query that

play05:09

I ask and endtoend task performance

play05:12

right while being significantly more

play05:14

coste effective in performance of latent

play05:16

memory throughput and energy consumption

play05:18

so obviously at the end of the day all

play05:20

the llm model will specifically have

play05:22

this kind of constraint right which are

play05:24

specifically with huge uh number of

play05:27

parameters let's say 7 billion 170

play05:29

billion right right and if you're just

play05:31

using this three numbers Min -1 0 1

play05:34

you'll be able to understand why I'm

play05:35

saying that because of this tary values

play05:39

right you'll be seeing how abundance the

play05:41

performance improves okay so furthermore

play05:45

uh so here you can probably see all this

play05:46

points uh Laten memory throughput and

play05:48

energy consump uh consumption uh energy

play05:51

consumption can be with respect to

play05:52

inferencing with respect to fine tuning

play05:54

and all okay now let's understand how

play05:59

this

play06:00

operators how this values will be

play06:01

basically used okay this is also

play06:03

important so with respect to this what I

play06:05

am actually going to do I am going to

play06:08

make sure that to explain you I take the

play06:12

right thing okay so let's understand

play06:15

this okay understand guys whenever we

play06:18

talk about parameters these are my

play06:20

weights okay these are my

play06:23

weights let's see so these are my

play06:26

weights

play06:28

okay and these are my weights so let's

play06:31

consider that my initial Transformer llm

play06:34

weights is this one okay now by when we

play06:38

say 1 bit

play06:39

llm we are going to convert all these

play06:43

values and replace them with either of

play06:46

these three values minus one 0 comma 1

play06:49

okay so that is the reason that you see

play06:51

over here all these weights is being

play06:53

getting converted into something like

play06:55

this okay minus1 0 or 1 only that three

play06:58

parameters is is there okay and this is

play07:01

what we basically say as bitnet B

play07:04

1.58 okay and this is also called as

play07:08

parito Improvement how this is basically

play07:10

happening I will talk about it okay just

play07:12

give me some time there will be some

play07:14

kind of quantization getting applied

play07:17

here also okay quantization getting

play07:20

applied over here okay to convert these

play07:22

values to this okay now let's understand

play07:26

one very important thing okay and this

play07:28

is the most important thing what will

play07:30

happen if you convert this values to

play07:33

this see with respect to any fine-tuning

play07:36

or forward propagation backward

play07:37

propagation what exactly happens the

play07:39

model weights the model weights over

play07:43

here is basically getting multiplied by

play07:46

the inputs and then we get the output

play07:48

right yes additionally we add a bias so

play07:51

it's okay we don't include a bias right

play07:52

now over here just to show it to you so

play07:54

over here this let's consider that this

play07:56

is my floating Point 16 number so every

play07:59

number will get multiplied by the input

play08:02

right and then what will happen is

play08:05

that after that all the it's it's just

play08:08

like this right summation of I equal to

play08:10

1 to n w of x + B right so this is what

play08:14

is the operation that is basically

play08:16

happening whenever we do the forward

play08:17

propagation Whenever there is an

play08:19

updation of weight that basically means

play08:21

we are doing the summation of weights

play08:23

and the input right so once we are doing

play08:26

this and then we are doing the sumission

play08:28

okay but if we have all these weights in

play08:30

the form of -1 1 0 then what will happen

play08:34

is that over here you'll be seeing that

play08:36

multiplication operation will not be you

play08:38

know that much valuable right so over

play08:41

here first of all we are doing

play08:42

multiplication then addition but over

play08:44

here we are just doing addition no

play08:46

multiplication because any number it

play08:48

will be multiplied by 0 is z only any

play08:50

number that is multiplied by one is one

play08:52

only any number that is multiplied by

play08:54

minus1 is minus1 only so over here the

play08:57

main thing is that your add addition

play08:59

operation is only

play09:01

happening addition operation is only

play09:03

Happening Now obviously if you only need

play09:05

to do addition operation then what will

play09:07

happen your GPU will not be requiring

play09:09

that much GPU also so your GPU will also

play09:11

get reduced why why this operation takes

play09:15

more GPU because multiple multiplication

play09:17

needs to happen right with respect to

play09:19

different different weights right then

play09:21

addition of all those values needs to

play09:22

happen because in the forward

play09:24

propagation this is what is the equation

play09:26

that specifically happens right we

play09:28

multiply

play09:29

the weights with the inputs and then we

play09:31

do the sumission and then finally we add

play09:33

the bias right so this is the most

play09:36

important thing so here you'll be able

play09:38

to understand with floating 16 right all

play09:41

the numbers is first of all multiplied

play09:43

by the inputs and then the sumission is

play09:45

done but here your values are with

play09:47

respect to Turner that is min -1 0 1 so

play09:50

here multiplication is already skipped

play09:53

because 1 into x0 is x0 only right it is

play09:56

a simple multiplication right and that

play09:58

much resources will not be required for

play09:59

simplistic multiplication so here

play10:02

maximum to maximum only addition will be

play10:04

required right so I hope you're able to

play10:07

understand because of this technique of

play10:09

Paro Improvement because of this

play10:12

technique of Paro Improvement you'll be

play10:14

able to see that what we are able to

play10:16

achieve right and obviously when we are

play10:19

able to achieve this the GPU will be

play10:22

required less when we are doing the

play10:23

fine-tuning or training right so I hope

play10:26

you have got this as an complete idea

play10:29

and you have understood right why we

play10:32

specifically do this how it is done how

play10:35

this transformation is done so here you

play10:37

can probably see that it provides a Paro

play10:39

solutions to reduce inferencing cost

play10:41

latency throughput and energy of llm

play10:44

while maintaining the model performance

play10:45

the new computation Paradigm of Paradigm

play10:48

of bitnet 1.58 calls for Action to

play10:52

design new hardware optimization for

play10:54

1bit llm right I know guys this is more

play10:57

of a research paper so I'm reading and

play11:00

I'm telling you each and everything and

play11:01

also explaining you the concept I know

play11:03

this can be a little bit of boring but

play11:05

trust me you need to understand in this

play11:07

specific way okay now let's talk more

play11:11

about this and we will have highlighted

play11:14

main main things in this green color

play11:15

okay these models have demonstrated

play11:17

remarkable performance in a wide range

play11:19

of natural language processing tasks

play11:21

like llm models but their increasing

play11:24

size has posed challenges for deployment

play11:26

and raised concern about the

play11:27

environmental and economic impact due to

play11:30

high energy consumption obviously this

play11:32

is the problem with llms that are

play11:33

already available one approach to

play11:36

address the challenges to use post

play11:38

trining quantization to create low bit

play11:40

models for inferencing I've already

play11:42

discussed about this quantization Laura

play11:45

clora everything this technique reduces

play11:47

the Precision of weights and activation

play11:49

significantly reducing the memory and

play11:51

computational requirement of llm the

play11:53

trend has been to move from 16 bit to

play11:55

lower bit such as 4bit variant this is

play11:58

what is basically happening with respect

play12:00

to llm models right this is with llm

play12:04

okay this is with

play12:07

llm so here I'll write llm models now

play12:10

let's see with the help of one bit

play12:12

architecture one bit model architecture

play12:14

what we can solve so recent work on one

play12:17

bit model architecture such as bitnet

play12:19

presents a promising direction from

play12:21

reducing the cost of llm while

play12:23

maintaining the performance vanina llms

play12:27

vanila llms are in 16bit flow flating

play12:29

values and the bulk of L LM is matrix

play12:32

multiplication therefore the major

play12:35

computation cost comes from floating

play12:37

Point addition and multiplication

play12:38

operation I said you just now on top of

play12:40

it right in contract the matrix

play12:42

multiplication of bit net only involves

play12:44

integer addition because anything

play12:47

multiplied by one is one uh anything

play12:49

multiplied by one is that same number

play12:51

anything multiplied by minus one is that

play12:53

same number with a negative sign

play12:54

anything multiplied by 0 is obviously

play12:56

zero right so as the fun FAL limit to

play12:59

compute performance in many chips is

play13:01

power this energy saving can be

play13:02

translated into faster computation now

play13:04

this is the most important thing right

play13:08

and here you can clearly see the things

play13:10

that I've highlighted right I hope you

play13:13

get an idea how good this one bit llm

play13:16

can be okay then you can still read

play13:20

about it here we are going to just use

play13:21

terally values like Min -1 0 1 and

play13:25

obviously because of this zero your 58

play13:28

bit is basically increasing there are

play13:30

two major advantages of using this also

play13:33

it is written over here

play13:35

see further more bitnet oh my God why

play13:39

this is getting highlighted like okay

play13:41

furthermore bitnet offers two additional

play13:44

Advantage first its modeling capacity is

play13:47

stronger due to explicit support for

play13:49

feature filtering how feature filtering

play13:51

happen because anything multiplied by

play13:52

Zer will be zero onina right made

play13:56

possible by the inclusion of zero in the

play13:58

model weight which can significantly

play13:59

improve the performance of 1 bit llm

play14:01

secondly our experiment shows can match

play14:03

full Precision Baseline in terms of this

play14:05

end to end task performance starting

play14:07

from a 3B size okay now most of the

play14:10

things that you are able to see right

play14:12

now let's discuss about one more

play14:13

important thing uh that is how this

play14:16

transformation is happening how this

play14:19

number are getting converted to this it

play14:21

is just by using the simple mathematical

play14:23

equation or this quantization function

play14:27

okay Quant qu ization

play14:30

function quantization function okay and

play14:34

this quantization function is called as

play14:36

absolute mean quation and this is the

play14:39

formula that is basically used by which

play14:41

all the numbers are basically getting

play14:43

converted to only this three values okay

play14:46

0 1

play14:49

okay 1 0 1 okay just by applying this

play14:54

particular formula okay so in uh and

play14:58

there is also one more change with

play14:59

respect to the Transformer it replaces

play15:01

nn. linear with bit linear okay so this

play15:04

bit linear I think uh you'll be able to

play15:07

see that it trained from scratch with

play15:09

1.58 bit weights and 8 bit Activation so

play15:12

this is what it is basically done with

play15:13

respect to the initial training okay so

play15:16

most of the thing I have actually

play15:18

discussed over here uh let's talk about

play15:21

the performance so over here you'll be

play15:23

able to see that uh the Llama model of

play15:25

700 million parameters bitnet will also

play15:28

have 7 million parameters but here you

play15:29

see the memory is in decreasing right

play15:33

over here 2.08 1.18 12.33 is getting

play15:36

reduced to

play15:38

8.96 and then this PPL is basically

play15:41

12.87 so over here you can see that how

play15:44

it is getting reduced now similarly when

play15:45

the billion of parameters are basically

play15:48

increasing right let's say with Lama is

play15:50

1.3 billion right the parameter will be

play15:52

same but memory again 1.14 is required

play15:54

97 11.29 right and similarly over here

play15:58

also you'll be able to see the same

play16:00

thing is basically happening so memory

play16:01

is decreasing latency is also decreasing

play16:04

for the inferencing purpose perfect and

play16:07

uh one more parameter that you'll be

play16:09

able to see with respect to model size

play16:10

and latency right model size so the the

play16:15

blue color is basically the Llama model

play16:17

OKAY the orange color is basically one

play16:19

bit llm models you'll be able to see how

play16:22

much huge latency difference is there

play16:25

similarly with respect to this how much

play16:28

huge memory differences there right to

play16:30

save this kind of models so uh this is

play16:33

just the research paper that has come up

play16:34

recently but uh I'm really really happy

play16:37

to see this because in the future many

play16:40

things is going to happen so again I

play16:42

would like to welcome you all to the era

play16:44

of 1 bit llm models and now you'll also

play16:47

be able to use this onebit llm model

play16:49

soon I think first of all hugging face

play16:51

will only come and try to implement all

play16:53

these things where you can also easily

play16:55

create your application using gender AI

play16:57

so I hope you like this part video if

play16:58

you like it please make sure that you

play17:00

subscribe my channel press the Bell

play17:01

notification icon I'll see you in the

play17:02

next video have a great day thank you

play17:03

one all take care bye-bye