Episode 1- Efficient LLM training with Unsloth.ai Co-Founder

Unsupervised Learning
31 Jan 202419:52

Summary

TLDRIn this episode of 'Unsupervised Learning,' Renee interviews Daniel, co-founder of UNS, creators of the AI training system Sloth. They discuss how Sloth achieves 30 times faster fine-tuning for large language models, reducing memory usage by 50%. Daniel shares his background at Nvidia, where he optimized GPU algorithms, and how that experience influenced the development of Sloth. The conversation delves into the challenges and potential of language models, the importance of treating them as multiple agents, and the future vision of personal AI chatbots. Daniel also talks about the open-source community's role in UNS's growth and how they can support the project through collaborations and donations.

Takeaways

  • 🚀 UNS (Unsupervised Learning) Sloth is an AI training system that accelerates fine-tuning of large language models by 30 times.
  • 🌟 The open-source package from UNS has 3,000 GitHub stars and reduces memory usage by 50%, making fine-tuning twice as fast.
  • 💡 Daniel's experience at Nvidia involved making algorithms run 2,000 times faster on GPUs, which influenced the development of UNS Sloth.
  • 🏆 UNS Sloth was developed in response to the LLM Efficiency Challenge, focusing on training faster to achieve high accuracy.
  • 🔍 The system rewrites the backpropagation algorithm in hard math and optimizes it for GPU code, leading to efficiency gains.
  • 🌐 UNS Sloth supports training on various languages, not just English, and simplifies the conversion of models to different formats.
  • 🤖 The team behind UNS Sloth uses AI, like Chat GPT, for engineering tasks and to overcome coding challenges.
  • 📊 Language models have limitations in math due to tokenization issues, specialized training data, and their design not being focused on mathematical tasks.
  • 📚 RAG (Retrieval-Augmented Generation) is a method for knowledge injection into language models, allowing them to search large databases for accurate answers.
  • 🌐 Daniel's preferred sources for staying updated on AI include Twitter, Reddit, and YouTube channels like Yan's.
  • 💸 UNS Sloth is currently bootstrapped, and they welcome community support through collaborations and donations via their GitHub page.

Q & A

  • What is UNS and how did it come to be?

    -UNS (unsupervised learning) is an open-source package developed to make the fine-tuning of language models 30 times faster. It was created by Daniel, the co-founder, with the goal of reducing the time and memory usage required for fine-tuning, making it more accessible and efficient.

  • How does UNS reduce memory usage during fine-tuning?

    -UNS reduces memory usage by 50% by optimizing the kernels in Triton language and rewriting the entire backpropagation algorithm in hard maths, which allows for more efficient memory management and faster processing.

  • What was Daniel's role at Nvidia and how did it influence UNS Sloth?

    -Daniel worked at Nvidia with the goal of making algorithms on the GPU faster. His experience in writing Cuda kernels and optimizing algorithms was applied to UNS Sloth, where he rewrote the kernels in Triton language and performed extensive code optimizations.

  • What is the significance of the Triton language in UNS Sloth?

    -Triton language is used in UNS Sloth for its performance and efficiency benefits. It allows for the creation of optimized kernels that improve the speed and memory usage of the fine-tuning process.

  • How does UNS Sloth address the language limitations of models like MESTRO and LLaMA?

    -UNS Sloth allows users to fine-tune open-source models like MESTRO and LLaMA in any language, not just English. This overcomes the limitation of these models being English-based and enables training in languages like Portuguese or Mandarin.

  • What are some of the challenges with language models in performing mathematical operations?

    -Language models often struggle with mathematical operations like multiplication and addition due to tokenization issues and the specialized nature of math problems. They may not have seen complex formulas in their training set, which limits their ability to perform certain mathematical tasks.

  • How does Daniel envision the future of UNS Sloth?

    -Daniel's vision for the future includes a fine-tuning bot on every computer, even those with weak GPUs. This bot would read personal data, perform fine-tuning daily, and learn about the user to become a personal chatbot, making AI more accessible and personalized.

  • What is RAG and how does it enhance language models?

    -RAG (Retrieval-Augmented Generation) is a knowledge injection system that allows language models to search large databases, like Wikipedia or the internet, to find correct answers. This enhances the model's ability to provide accurate and up-to-date information.

  • How does Daniel stay updated with the latest developments in AI?

    -Daniel uses Twitter for new releases, Reddit for the latest information, and YouTube for educational content. He particularly recommends Yan's YouTube videos for staying up-to-date with AI developments.

  • How can people support UNS and contribute to its development?

    -Support for UNS can come in the form of contributions to their open-source package, feature requests, or financial donations through platforms like Ko-fi (pronounced 'ko-fi' or 'coffee'), which helps fund further development and implementation of new features.

Outlines

00:00

🚀 Introduction to UNS Sloth and AI Training

Renee interviews Daniel, co-founder of UNS, about their AI training system, UNS Sloth, which accelerates fine-tuning of large language models by 30 times. They discuss the open-source package that makes fine-tuning twice as fast and reduces memory usage by 50%. Daniel shares his background at Nvidia and the inspiration behind UNS Sloth, which emerged from a competition focused on training large language models efficiently. The conversation delves into the technical aspects of optimizing GPU algorithms and the use of Triton language for kernel development.

05:00

🤖 Utilizing Chat GPT and Open Source

Daniel and Renee discuss the practical applications of UNS Sloth, including the ability to fine-tune models in different languages and the challenges faced by language models in mathematics. They touch on the use of Chat GPT for engineering tasks and the limitations of language models in handling mathematical problems due to tokenization and training data constraints. Daniel also shares his thoughts on the potential of language models to achieve AGI and the importance of treating them as a system of multiple agents rather than a single entity.

10:01

🌐 Democratizing AI and the Future of UNS Sloth

The discussion shifts to the broader goals of UNS Sloth, which include making AI accessible to everyone and reducing energy consumption. Daniel talks about the partnership with Hugging Face and the aim to enable fine-tuning on even small graphics cards. He envisions a future where everyone has a personal AI chatbot on their computer, capable of learning about the user and providing personalized assistance. The conversation also covers the misconceptions about working with large language models and the importance of using them in tandem for better results.

15:04

🔍 Keeping Up with AI and Supporting UNS

Daniel shares his sources for staying updated in the AI field, including Twitter, Reddit, and YouTube, and emphasizes the value of local communities for the latest information. He also discusses the current bootstrapped state of UNS and how people can support their work through donations and collaboration on their open-source GitHub repository. The episode concludes with a mention of the sponsor, Jan, which allows users to run any LLM locally on their devices.

Mindmap

Keywords

💡Unsupervised Learning

Unsupervised Learning is a type of machine learning where the model is not given any labeled data. Instead, it learns to identify patterns and structures from the input data on its own. In the context of the video, it refers to the podcast where the interview with Daniel takes place, focusing on AI and open source technology.

💡AI Training System

An AI Training System is a set of tools and processes designed to train artificial intelligence models. These systems often involve algorithms that enable the AI to learn from data. In the video, Daniel discusses UNS Sloth, an AI training system that accelerates the fine-tuning process for large language models.

💡Fine-tuning

Fine-tuning in machine learning refers to the process of adjusting a pre-trained model to a specific task or dataset. This typically involves training the model on a smaller, more specialized dataset to improve its performance. The video highlights UNS's ability to make fine-tuning 30 times faster.

💡Large Language Models

Large Language Models (LLMs) are complex AI models designed to process and generate human-like text. They are trained on vast amounts of text data and can perform various language tasks. The video discusses the challenges and innovations in fine-tuning these models more efficiently.

💡GPU

A Graphics Processing Unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. In the video, GPUs are discussed as essential hardware for running AI models and the improvements made to optimize their performance.

💡CUDA

CUDA (Compute Unified Device Architecture) is a parallel computing platform and API model created by NVIDIA. It allows developers to use NVIDIA GPUs for general purpose processing. Daniel mentions his work at NVIDIA, where he made algorithms run faster on GPUs, which is relevant to the efficiency of UNS Sloth.

💡Open Source

Open source refers to a software whose source code is made available for modification or enhancement by anyone. The video emphasizes the open source nature of UNS, allowing for community contributions and improvements.

💡Triton Language

Triton Language is a programming language designed for high-performance computing, particularly for NVIDIA GPUs. It is used in the UNS Sloth system to rewrite kernels, which are essential for the system's performance optimizations.

💡Memory Usage

Memory usage refers to the amount of computer memory occupied by a program or process. In the context of the video, reducing memory usage is a key goal of UNS Sloth, as it allows for more efficient fine-tuning of large language models.

💡LLM Efficiency Challenge

The LLM Efficiency Challenge is a competition mentioned in the video where participants aim to train a large language model to achieve the highest accuracy within a limited time frame, such as one day on one GPU. This challenge inspired the development of UNS Sloth.

💡Knowledge Injection

Knowledge Injection is a technique where external knowledge, such as from a database or library, is integrated into an AI model to enhance its performance. In the video, RAG (Retrieval-Augmented Generation) is discussed as a form of knowledge injection, allowing language models to search and utilize large databases to provide accurate answers.

Highlights

Daniel, co-founder of UNS, discusses their AI training system, Sloth, which accelerates fine-tuning of large language models by 30 times.

UNS has an open-source package that makes fine-tuning two times faster and reduces memory usage by 50%.

Daniel's previous work at Nvidia focused on making algorithms run faster on GPUs, including speeding up TS and randomized SVD.

UNS Sloth was developed during the LLM Efficiency Challenge, aiming to train language models faster to achieve high accuracy.

UNS rewrote kernels in Triton language and performed extensive optimizations to improve efficiency.

UNS is agnostic to the type of GPU, supporting both Nvidia and AMD, and can fine-tune models faster than traditional methods.

The open-source package includes around 50 tricks to enhance speed and efficiency.

UNS allows fine-tuning of language models in various languages, not just English, addressing a limitation of previous models.

The community is using UNS for model conversion, leveraging the fine-tuning notebook for non-technical users.

Daniel's brother handles the non-technical aspects of UNS, including the frontend and website, while Daniel focuses on algorithms.

UNS uses Chat GPT for engineering help and to overcome roadblocks in development.

Language models struggle with math due to tokenization issues and specialized training data.

UNS aims to democratize AI by making it accessible to everyone, regardless of the size of their GPUs.

Daniel explains the misconception of treating language models as a single system, advocating for a layered approach with multiple models.

UNS has partnered with Hugging Face to make large language models more accessible.

The future vision for UNS includes a personal fine-tuning bot on every computer, providing a personal chatbot experience.

RAG (Retrieval-Augmented Generation) is a method for knowledge injection into language models, allowing them to search large databases for answers.

Daniel would like to interview someone from OpenAI who believes that language models are sufficient for AGI (Artificial General Intelligence).

Daniel uses Twitter, Reddit, and YouTube for staying up-to-date with the latest AI developments.

UNS is currently bootstrapped and open to community support through collaborations and donations.

Transcripts

play00:00

[Music]

play00:00

hi Renee here at unsupervised learning

play00:03

your easy listening podcast for bleeding

play00:05

edge open source Tech today we're

play00:08

speaking to Daniel co-founder of UNS

play00:09

sloth the AI training system boasting 30

play00:12

times faster fine tuning for large

play00:14

language models we speak on working with

play00:16

family his Beginnings at Nvidia what the

play00:19

hell is rag anyway and his favorite

play00:21

places to keep up to date join me for

play00:23

this catchup on unsupervised learning

play00:26

are you able to give a quick rundown of

play00:29

unsl how it came to be so we make fine

play00:31

tuning of language models 30 times

play00:33

faster we have an open source package

play00:35

with 3,000 GitHub Stars it makes fine

play00:37

tuning two times faster and it reduces

play00:39

memory usage by 50% so the goal of unso

play00:42

was before it can take very slow to fine

play00:44

tune it can use lots of memory and now

play00:46

with our like free collab notebooks you

play00:47

can like just click you know run all you

play00:49

can put your data set whatever you like

play00:51

and then it just fine tunes and finally

play00:53

at the end you can like export it to

play00:54

like ggf you can export it to like VM

play00:57

that's good to know so you PR Rec worked

play01:00

with Nvidia is that where the uh writing

play01:04

Cuda Triton kernel kind of idea came

play01:07

from

play01:09

Bron yeah so I used to work at Nvidia so

play01:12

my goal was to make algorithms on the

play01:14

GPU faster so I made like TS 2000 times

play01:16

faster I made like randomized SVD faster

play01:18

as well that was my like old role and so

play01:21

I think I think the goal of UNS sloth

play01:23

was I think it came about like last like

play01:25

October there was a competition called

play01:27

the llm efficiency challenge by neurs um

play01:30

the goal was you have one day on one GPU

play01:33

to train one llm to you know to attain

play01:35

the highest accuracy so we talk a

play01:37

different approach and we thought like

play01:39

you know to get the highest accuracy you

play01:40

can also train faster to reach that you

play01:43

know that accuracy faster and so like we

play01:45

diverged and we focused on you know

play01:47

training faster and so like I think yeah

play01:49

we did take out Nvidia experience of

play01:51

writing Cuda kernels to unso and so we

play01:54

rewrote you know all of the kernels in

play01:56

Triton language in open a eyes Tron

play01:59

language we we also did like mass

play02:00

manipulations and we did lots of

play02:02

optimizations in the code yeah so it's

play02:03

like something I was wondering if it was

play02:06

a case of those things that you don't

play02:09

think will be relevant or you don't

play02:10

realize will be relevant then becoming

play02:12

relevant later on yeah that yeah I think

play02:15

that's a fair point I think like openi

play02:16

Tron language wasn't that popular but

play02:18

then I think like now it's getting very

play02:20

very popular because like you know you

play02:22

want performance you want efficiency you

play02:24

want memory usage so like Triton kernels

play02:26

is a part of our system but I think like

play02:29

that's only half of the system I think

play02:30

the rest the other half is like all math

play02:32

equations and like you know how do we

play02:34

write the math equations more

play02:35

efficiently yeah and your agnostic so

play02:38

you run is it you run Nvidia AMD gpus

play02:43

and what's the other Intel for the non

play02:46

technical people like me and some other

play02:50

people listening the kind of value prop

play02:52

I guess is you can fine tune it way

play02:56

faster than traditional methods so is it

play02:59

in the meth method that is

play03:01

different so what we do is we take like

play03:03

the entire like back propagation

play03:05

algorithm in the fine shooting process

play03:07

right like the gradient descent process

play03:09

and we rewrite it um in like hard maths

play03:13

right so like we write all the math

play03:14

equations down and then we optimize

play03:16

every single line with like the correct

play03:18

like the more optimal format for like

play03:20

the math equations for example you can

play03:22

like rack it correctly and then that can

play03:24

increase speed and we can we also like

play03:27

reduce memory usage by writing

play03:28

everything in like cud programming so we

play03:31

re rewrite everything in lowlevel GPU

play03:33

code um and this can reduce memory usage

play03:35

and make it faster so there's like 50

play03:37

something tricks in like the open source

play03:39

package and that's how we make it faster

play03:41

aside from knowing you from the team at

play03:43

Jan I also saw you've got heaps of

play03:45

people talking about you online and I

play03:47

think there was a there was a thread on

play03:49

like black hat world or something like

play03:51

that and it was like people were so

play03:52

amazed like oh it's open source now but

play03:54

they were like explain it to me in

play03:56

layman terms because they they were

play03:57

excited about it but they didn't know

play04:00

what they could do with it so what are

play04:02

you seeing like I'm in your Discord but

play04:04

what are you seeing the community sort

play04:06

of doing with

play04:09

unso yeah so I think a few days ago

play04:12

people like so for example the open

play04:13

source models like mestro and llama

play04:15

they're only english-based so they

play04:18

somehow like trained it on their own

play04:19

Like Home Country language like vmin or

play04:22

like you know any Portuguese or Spanish

play04:24

you can train you can like use our fine

play04:26

tuning notebook to train on any language

play04:28

that you like um so I think that was the

play04:30

fundamental like problems of mist and

play04:31

llama it's just english-based you can

play04:33

also train like you know Mandarin and

play04:35

other languages very simply I think that

play04:37

was the most like interesting thing that

play04:38

I found recently I think like some other

play04:41

people they didn't even know how to like

play04:42

convert like a model to like ggf or like

play04:45

llama C++ like formats VM like for

play04:48

example to use on Jan um and so like we

play04:51

also solve that conversion step as well

play04:53

and so they like currently people are

play04:55

just using our notebook just for

play04:57

conversion so they don't even do the

play04:58

finetuning step they just do the

play05:00

conversion step so that's a very

play05:01

interesting use case yeah yeah and so is

play05:04

your brother's is he like the

play05:06

non-technical one of you two would you

play05:09

say like how deep is he in the technical

play05:14

side of what you're doing yeah so like

play05:16

he does all the front end he does all

play05:17

the website he does like everything else

play05:19

that I don't want to do so I just do the

play05:21

algorithms and so he does everything

play05:23

else he's also like he's a very good

play05:24

like chat GPT user so we use we actually

play05:26

use chat GPT and like being chat for

play05:29

like some engineering health so that's

play05:32

so he's like that side of the equation

play05:33

and I just do the algorithms yeah

play05:35

obviously the whole kind of idea of this

play05:38

podcast is for people like you like

play05:40

really clever people to explain things

play05:43

to me like I'm a child so on like

play05:46

autonomous agents and stuff like that

play05:47

like I don't want to dispute the

play05:49

definition of agent but like do you have

play05:51

because you're just a tiny team right

play05:53

but you clearly have a lot that you're

play05:55

doing a lot that you're shipping do you

play05:57

have like you make use of that in an

play06:00

autonomous way or is it more like you're

play06:02

kind of Consulting something like chat

play06:04

GPT to then help you with

play06:07

code yeah that's a very good question I

play06:09

think so I think if we wanted to make

play06:12

like applications and like different

play06:14

things and we get like we get like stuck

play06:16

somewhere I think CH p is a very good

play06:18

unstuck like a you know it can make you

play06:20

unstuck on something that you get stuck

play06:22

on so that's very useful I think we

play06:24

haven't actually used it for like a

play06:26

whole like application design like you

play06:28

know everything from scratch I did use

play06:30

it to like try to like do like Matrix

play06:32

differentials and try to do like like

play06:34

differentiation via chpt it was quite

play06:37

bad so I did not use I unfortunately had

play06:40

to fall back to like writing it on the

play06:41

like you know on paper and doing the

play06:43

differentials by hand um so I did try to

play06:45

use chbt for like the maass part it's

play06:47

just not that good so yeah so we just we

play06:50

generally use you know you know chbt gb4

play06:54

for like just unstuck mechanism can

play06:56

improve productivity a lot like how do

play06:58

you represent math

play07:00

in a word format and then is that why it

play07:03

has issues like why you can't trust it

play07:05

for maths so I think in terms of maths I

play07:08

think there was like three problems with

play07:10

like language models the first one is

play07:11

the tokenization problem so I think gbd4

play07:14

so from what I understand gbd4 tokenizes

play07:16

numbers by like you know 0er 1 two 3

play07:18

four 5 six seven eight nine they

play07:20

tokenize individual numbers but they

play07:21

also do like double digits so like you

play07:23

know 10 to 99 they also tokenize each

play07:26

one has like a ID and then like 100 to

play07:28

1,000 they take like individual ID so I

play07:31

know like you know language models are

play07:33

very bad at like multiplication they're

play07:35

quite bad at addition it's because of

play07:37

the tokenization issue so I know like

play07:39

llama fixed fixed it as well by

play07:41

tokenizing each so for example if you

play07:43

have a number 100 right it would

play07:45

tokenize each like they will split the

play07:47

numbers into individual digits right so

play07:50

one zero zero so that's one of the

play07:52

issues the other issue is I guess the

play07:54

training data is just too less like you

play07:56

know in maths there's like in maths each

play07:58

problem is like very specialized if you

play08:01

want to take the differential of like X

play08:02

squ that's that's easy that's 2x right

play08:05

so like but then if the formula gets

play08:07

very like complicated it might have not

play08:09

seen it in the training set and so I

play08:11

think you know once they have a larger

play08:12

training set this could you know by the

play08:14

phenomenon of grocking maybe they can do

play08:17

some maths so there was like a I think

play08:19

that people talk about qar um so

play08:21

supposedly qar like can solve you know

play08:24

like high school maths level problems so

play08:26

this was open a i thing so those are the

play08:28

two main problem s and the third one I

play08:30

guess is just like I don't think so

play08:33

language models are design for maths so

play08:35

you need to like maybe have like a

play08:36

separate component to like verify the

play08:38

maths like a proof machine or like

play08:41

python to write the maths um so those

play08:42

are the three main problems so thank you

play08:45

for the context by the way so when you

play08:47

spoke about the current kind of what

play08:50

unslot is being used for your recent

play08:53

kind of partnership like is that part of

play08:56

something bigger or is it like part of

play08:58

an open source plan so we did do a

play09:00

partnership with hugging face on a Blog

play09:02

so we did a Blog with them we also in

play09:04

the TRL docs and so we we're trying to

play09:06

like make like you know llms accessible

play09:09

to everyone and so our goal is to make

play09:11

you know even the very small graphics

play09:12

cards I think like the biggest

play09:14

difference is like you have like a 8 GP

play09:16

eight gigabyte like you know vram on

play09:18

your GPU people before couldn't even

play09:21

train on like a very you know on a 8GB

play09:23

graphics card but now we made it like

play09:25

you know fit just you know just there It

play09:27

just fits in AGB I and so like we can

play09:30

like we essentially our goal is to make

play09:32

AI like you know easily available to

play09:34

everyone reduce energy usage because

play09:36

this training two times faster we also

play09:37

want to reduce energy usage and we just

play09:39

want to make it more productive for

play09:41

people so that's yeah the goal is to

play09:42

democratize AI for everyone but the only

play09:45

to find tuning space for now

play09:46

[Music]

play09:56

yeah so what's something that people get

play09:59

wrong about working with llms yeah

play10:01

that's a great question I think the

play10:03

biggest issue would be like people treat

play10:05

llms like as one system so like when you

play10:08

call in an llm you might get an answer

play10:11

but I think that's the wrong way to go

play10:12

about it you're supposed to treat llms

play10:14

as like this one agent like you know

play10:16

you're supposed to use llms as if like

play10:18

you have like 10 of them right so like

play10:20

you build layers on top of each layer so

play10:22

one llm you can tell it okay you are a

play10:25

you are a verify right you verify

play10:27

someone's answers and then just

play10:29

someone's answers is another llm which

play10:31

generates answers right so one one llm

play10:33

does the generation of your answer one

play10:35

of them does the verification one of

play10:37

them does the implementation right and

play10:39

execution of your answer right so like

play10:41

you need to I think the biggest

play10:42

misconception is like like you know a

play10:44

language model is this very powerful

play10:45

system that can answer every single

play10:47

question but I think like you need to

play10:48

prompt it correctly and then like you

play10:50

know use many of them in tandom together

play10:54

and this can be more powerful than just

play10:56

using like one system I think like the

play10:58

biggest issue is I can see feel like

play10:59

people using it as one system and then

play11:01

they like you know constantly repeat

play11:03

answers and it sometimes doesn't give

play11:04

the correct answer and it sometimes

play11:06

forgets like what you said before so I

play11:08

think that way to solve this problem is

play11:10

to use many of them together CPU GPU

play11:14

we're gonna go right back to the start

play11:16

Daniel because CPU is

play11:21

what CPU is like the processor on the

play11:23

computer so like so the GPU is like a

play11:25

graphics card for like gaming and like

play11:27

you know uh run like language models and

play11:30

you know your computer display you can

play11:32

have like a CPU but then if you want to

play11:34

play games it'll be so slow so you need

play11:36

to add this extra Hardware called the

play11:38

GPU you shove it in your computer and

play11:40

that is the GPU it's like an extra it

play11:43

makes like you know Graphics faster and

play11:45

that's h100s is that oh that's very

play11:48

expensive so h100s are like extremely

play11:51

pricey yeah so nvidia's like I think you

play11:53

probably have one like now in a computer

play11:55

like RTX 39 RTX like 29 you you probably

play11:59

do have a GPU in your like computer yeah

play12:02

I can run I can run some things on Jan

play12:04

I've got like an it's a Mac it's a m

play12:07

something M1 yeah see what what we're

play12:10

working with here uh this is the yeah

play12:12

the idea is if if AI is going to be

play12:15

accessible and democratized it means

play12:16

that you know potato people people

play12:19

running potatoes and people who have

play12:21

potato brains need to understand it so

play12:24

that's not a roast that's just a reality

play12:26

of the fact that I think that there's so

play12:27

much technology that's being being

play12:29

developed that is so Advanced but we

play12:30

have no way of communicating it and it

play12:32

sits in this very kind of

play12:35

uh closed off like you know you need to

play12:37

like like it's yeah all the advancements

play12:40

are like you know very hard to like

play12:41

access and stuff yeah it's it's

play12:43

unintentional but I think like there's

play12:45

so much that happens in little tiny

play12:47

subreddits that the world needs to kind

play12:49

of know about and like I don't go in

play12:51

there because I'm scared but I see it

play12:54

and I'm like oh that that would be

play12:55

interesting if I knew what that was

play12:57

what's a GPU but but uh yeah so no no

play13:02

like yeah look it's like Olden like you

play13:04

know we generally run like CPUs there's

play13:06

like two right there's a CPU and there's

play13:08

GPU um technically GPU is not even

play13:11

necessary it's like it's optional you

play13:13

just if you want it to be faster you

play13:15

know more of responsive you can buy a

play13:16

GPU you can still do everything on a CPU

play13:19

it's just very slow

play13:21

yeah so bind tuning is the current

play13:26

and the the future is fluid I think the

play13:32

goal is to like our vision is in the

play13:35

future like everyone will have a

play13:36

graphics card on your computer it might

play13:38

be very weak like you know but but I

play13:40

think everyone will have a GPU on the

play13:42

computer and if this is the case we want

play13:44

to provide like a fine-tuning bot on

play13:47

your PC it reads all of your personal

play13:49

data it does not upload any data this is

play13:52

your personal chatbot that you have on

play13:54

your local machine and it does fine

play13:56

tuning every single day and so this this

play13:58

model will learn about you and you can

play14:00

ask questions whoever however you like

play14:02

so this become like a personal chbt a

play14:05

personal chbt for you only and so that's

play14:07

kind of our ultimate Mission that's yeah

play14:10

but that's like a future plan yeah yeah

play14:12

like a like a a non crappy

play14:15

clippy but yeah exactly right yeah it's

play14:18

yeah it's a personal chpt for everyone

play14:20

yeah yeah and so I guess a lot of the

play14:23

conversations in commercial use cases

play14:26

always that they love the idea of local

play14:29

because it's local that's how I came to

play14:33

kind of learn about what rag is and that

play14:36

was like a whole thing for me so I was

play14:38

like oh so that to my understanding this

play14:41

is like semi off topic but you're a

play14:43

person who knows in front of me so I'm

play14:45

gon to ask you that rag is a way of

play14:49

verifying the information that you get

play14:51

back from a large language model is that

play14:54

right so yes so rag I guess is like

play14:58

knowledge injection so like pretend you

play15:01

ask a question's today's date right so a

play15:03

language w't know that right it doesn't

play15:06

understand anything about like current

play15:07

events so what you can do is you can

play15:09

take all of Wikipedia right shove all of

play15:11

Wikipedia as like a library and allow

play15:14

the language model to search in the

play15:15

library for the answer right so like if

play15:18

I ask the question again what is today's

play15:19

date Wikipedia will say you know

play15:21

somewhere on the homepage today's date

play15:23

is you know whatever the date is right

play15:25

and so like the langu you can use rag

play15:27

which is retrial or generation to like

play15:30

search a big library for the correct

play15:33

answer so that's rag nice so it's

play15:37

different is it different from like chat

play15:39

GPT or B accessing the internet that's

play15:42

just different right oh that's that's

play15:44

also rag so like so assume so you know

play15:47

like when I said like the Wikipedia is

play15:48

like a library assume Google the

play15:51

internet is the library right so like

play15:53

it's just replace replace Wikipedia with

play15:55

Google like Google search replace

play15:57

Wikipedia with like you know with

play16:00

something else like you know if you want

play16:01

to do like movies if you have a

play16:02

complicated question about a movie you

play16:04

just replace it with like the IMDb data

play16:07

like you know database if you have like

play16:08

a law question replace it with like

play16:10

Supreme Court like you know judgments or

play16:13

something so rag is just like knowledge

play16:15

injection system you just need to have a

play16:16

big database yeah nice I've got two more

play16:20

questions the first one is who would you

play16:22

interview in the AI space if you could

play16:27

interview someone in the AI space

play16:29

I think I would like to talk to an

play16:30

interview maybe ier from open AI I think

play16:33

I've watched his videos and understood

play16:35

his views I I think his main his very

play16:38

interesting take is language models are

play16:40

enough for AGI um because he like most

play16:44

people think that if you predict the

play16:45

next word right it's just language

play16:47

models are just predicting the next word

play16:49

right it's nothing special it's not like

play16:51

intelligence or something right but then

play16:53

his take is if you want to predict the

play16:54

next word before it you need to

play16:57

understand everything about the world

play16:59

right like if you want to predict

play17:00

something if you want to predict the

play17:01

next word you need to understand

play17:03

everything the context of everything

play17:04

about it you know what what why is the

play17:07

word after these few words you need to

play17:10

understand everything in the world and

play17:11

so like his take is very fascinating I

play17:13

think I would like to yeah if I had a

play17:15

chance I would talk to him

play17:16

yeah did you did you go to the Sam

play17:19

Oldman chat in I think it was Sydney

play17:22

like last year no I did not it's funny

play17:24

it was in Sydney but I didn't go yeah

play17:27

was I think right yeah where do you keep

play17:30

up with everything because I know that

play17:32

when you work in AI you actually don't

play17:33

sleep it's not allowed it's in the rules

play17:36

and I said like was it an RSS feed like

play17:38

to have a favorite YouTube or is it just

play17:41

discords

play17:42

Reddit yeah I think I I think Twitter's

play17:45

generally useful for like new releases

play17:48

of stuff Twitter's yeah Twitter's pretty

play17:49

good R yeah local llama is very useful I

play17:52

think like local Lama has like all the

play17:54

latest information very very useful I

play17:56

also like YouTube videos are generally

play17:58

good I think they're a bit delayed

play17:59

though I think YouTube like if you want

play18:00

to stay at the very edge there will be

play18:02

like one week delayed I like like Yan's

play18:05

like YouTube videos his very useful like

play18:07

his videos are very helpful I watch them

play18:09

like all the time so Yan is my

play18:12

recommendation yeah that's if you got

play18:14

like if you want to stay up to date with

play18:16

everything so Twitter Reddit and I guess

play18:19

YouTube I guess this sounds like

play18:21

everyone like it sounds like General

play18:23

like general advice um yeah yeah and

play18:27

you're currently

play18:28

bootstrapped where can

play18:31

people uh support you and how can people

play18:33

support you I know that you have a is it

play18:35

is it pronounced Kofi or coffee I

play18:38

actually don't know Kofi I I call it

play18:40

Kofi or is it coffee I don't know I

play18:42

always thought it was coffee or probably

play18:46

I don't know I'm gonna message them and

play18:48

ask but you have one and that is at

play18:53

unoth actually don't know what's the

play18:55

coffee

play18:56

link I think we just typed it's on a

play18:58

GitHub so we like we have like an open

play19:00

source package you know we're open to

play19:01

any collaborators from like the open

play19:03

source Community if you want to add like

play19:04

new feature requests or pool request

play19:06

more than happy if you want to support

play19:08

our work we have like a coile link um so

play19:11

like you can donate like some money to

play19:12

us and we can like Implement more

play19:14

features quicker like I think we do have

play19:16

like some collab expenses but there's

play19:17

not that much for like testing testing

play19:19

stuff out so I think that's the main

play19:20

expense we have but yeah more than happy

play19:22

to like accept community help if that's

play19:26

what you like yeah nice thank you very

play19:29

much thanks for joining me on

play19:31

unsupervised learning we were speaking

play19:32

to Daniel from

play19:34

unso this episode was sponsored by Jan

play19:37

run any llm locally on your device the

play19:40

Mac windows and Linux see the open Souls

play19:43

GitHub repo at Jan

play19:46

[Music]

play19:50

HQ

Rate This

5.0 / 5 (0 votes)

Related Tags
AI TrainingUNSLOTHOpen SourceGPU OptimizationFine-TuningLanguage ModelsNvidiaCudaTritonAI AccessibilityTech Innovation