LLM Starter Pack: A Pragmatic Guide to Success with the Large Language Models

ML Explained - Aggregate Intellect - AI.SCIENCE

10 Jul 202337:04

Summary

TLDRAmir provides a pragmatic view on using large language models. He explains common applications like writing assistance and coding, but cautions about risks like hallucination. He advises experimenting to see if models solve your problem before deployment, considering costs and flexibility. Composition and design are key - build a stack with language models as one component. Sophisticated combinations of models can provide competitive advantage. Use powerful tools like language models, but with awareness of limitations.

Takeaways

😊 LLMs are very useful for writing assistance, coding, querying data, etc but still imperfect
😵‍💫 Beware of hallucinations - LLMs can make up convincing but false information
😏 Evaluate if LLM solves your problem before productionizing; consider inference cost
🤔 Minimize hallucination risks; add guardrails like human review, fact checking etc
🤯 LLMs enable cool things like coding co-pilots, talking to data, writing books etc
😎 Experiment with public models & tools to determine if LLM meets your needs
🔍 For production use, optimize model latency, cost etc with MLOps, quantization etc
⚖️ Consider open vs closed source models based on needs like privacy, cost
📏 Bigger LLMs learn faster but benchmarks may not reflect production readiness
🛠 Be composable - use LLM as part of a stack, combine with other models

Q & A

What are some of the main applications and use cases presented for large language models?
-Some of the main applications mentioned are using them as writing aids, coding co-pilots to help generate code, enabling natural language interaction with data, and using them to extract information from unstructured data sources like PDFs, videos, and audio.
What risks or downsides are discussed regarding large language models?
-The main risks discussed are the tendency to hallucinate or fabricate factual information, leading to incorrect or misleading outputs. Several examples are provided of language models generating convincing but false information.
How can the risks of hallucination from large language models be mitigated?
-Some ways to mitigate hallucination risks include minimizing exposure through careful prompting and design, putting guard rails in place with human oversight or fact checking components, and using large language models as just one composable component in a larger AI stack.
What considerations are mentioned regarding deployment of large language models?
-Key deployment considerations cover factors like cost, latency, privacy, and flexibility needs. Additional model optimization, quantization, and hardware-software co-design can help maximize efficiency of deployed models.
When is training your own large language model recommended vs leveraging existing models?
-Training your own model requires extensive data, compute budget, and specialized teams. In most cases, leveraging existing models with techniques like prompting and in-context learning can meet needs without costly training.
How can combining large language models with other AI capabilities lead to more advanced solutions?
-Using large language models alongside other AI modules like specialized NER or NLP models, knowledge graphs, etc. allows creating sophisticated solutions that accentuate different strengths.
What framework is proposed for evaluating if and how to apply large language models to a problem?
-The suggested framework analyzes whether large language models can actually solve the problem, if solutions could be deployed to production, flexibility needs, and risks like hallucination before deciding on best approach.
How crucial is model and prompt engineering highlighted in effectively applying large language models?
-Effective prompting and model optimization techniques are emphasized as critical to maximize large language model potential while minimizing cost and latency tradeoffs.
What is the outlook given on the future potential and current maturity of large language model technology?
-The technology shows great promise but is positioned as still maturing rapidly, requiring thoughtful application design and awareness of limitations in present form.
Why is a composable AI stack incorporating diverse technologies suggested over reliance on large language models alone?
-Combining large language models with other specialized AI components allows accentuating different strengths to create more advanced and resilient solutions.

Outlines

00:00

😀 Introducing the speaker and topic of large language models

The paragraph introduces the speaker Amir and the topic of large language models (LLMs). It talks about familiar faces in the audience, excitement to share information on LLMs, whether they are overhyped, and how they can be used to build something useful. The talk will take a pragmatic approach.

05:02

😟 LLMs have imperfections like hallucination risk

The paragraph discusses that LLMs have risks like hallucination where they make up convincing but factually incorrect text. Research involves dealing with uncertainty and imperfection. The pragmatic approach is building something useful with LLMs while managing the risks.

10:02

😲 Examples of LLM failures and hallucination

The paragraph provides examples of public LLM failures like Meta's Galactica and Google's Bard, and lawyer's use of chatGPT. This highlights the risk of hallucination where LLMs make up convincing but false information.

15:03

🤔 Framework for deciding if and how to use LLMs

The paragraph introduces a framework for deciding if and how to use LLMs. Questions include - does the LLM solve my problem based on metrics, can it go to production, how flexible on cost and latency, and how bad would hallucinations be.

20:05

🔍 Tradeoffs between LLM size, accuracy, cost and speed

The paragraph discusses tradeoffs between LLM size, accuracy, computational cost and speed. Bigger LLMs have better few-shot learning but smaller ones can be more practical. Aim for simplicity with the smallest LLM that solves the problem.

25:07

🚀 Optimizing LLM deployment cost and latency

The paragraph covers optimizing cost and latency when deploying LLMs in production. This includes model quantization, pruning, server optimization using chips tailored for inference.

30:08

😎 LLMs now possible even on laptops

The paragraph shows exciting possibility of running LLMs even on laptops today. This enables easier prototyping and experimentation.

35:11

❔ Alternatives before deciding to train your LLM

The paragraph discusses trying in-context learning and prompting fine-tuned smaller models before deciding to train your own expensive LLM which needs massive data and compute.

Mindmap

Keywords

💡large language models

Large language models refer to a subset of deep learning models that are trained on massive amounts of text data to generate human-like language and text. As explained in the script, they are first pre-trained in an unsupervised way, then taught to follow instructions, before being fine-tuned for specific applications. Their key capability is natural language generation. The speaker discusses both their promise and limitations.

💡hallucination / fabulation

This refers to the tendency of large language models to 'make things up' or generate factually incorrect statements. As shown through examples in the script, they can provide convincing but false information. Understanding this limitation is critical when deciding whether and how to apply large language models.

💡in context learning

Rather than extensive fine-tuning, in context learning involves providing a large language model with a prompt that establishes the desired context and constraints for text generation. As noted in the script, effective prompting allows solving many problems without costly model retraining.

💡inference cost

Unlike training costs which occur once, inference costs are paid every time a deployed model is called to generate text or complete a task. The speaker cautions that the expense of running very large models can make applications economically unviable.

💡composability

This refers to combining large language models with other AI components rather than using them in isolation. Integration with task-specific models and establishing feedback loops with human input are noted as ways to create more robust applications.

💡hardware optimizations

Rather than simply running large models on GPU clusters, optimizations like quantization and pruning can improve latency and lower deployment costs significantly. Specialized AI accelerators also continue to evolve for efficient inference.

💡open versus closed source

The script advises weighing factors like model quality, privacy, license restrictions, and reproducibility when deciding between open source versus proprietary large language models.

💡design considerations

When building applications, the speaker emphasizes architectural choices that safeguard against potential model inaccuracies - such as interfaces for human-in-the-loop validation and integrating explanatory capabilities.

💡productization

Rather than prototype experiments, the script advocates developing large language model capabilities with the rigor required for real-world products - including scalable infrastructure and sustainable economics.

💡limits of utility

While noting promising capabilities, the speaker concludes that large language models may solve fewer problems than assumed. Systematic testing on specific use cases is advised rather than chasing benchmarks.

Highlights

Large language models can hallucinate and make up convincing but factually incorrect information

These models are useful as writing aids, coding assistants, and for talking to data

Put guardrails in place when using these models to minimize risk of failures from hallucinations

Use simple solutions like prompting and smaller foundation models where possible rather than large complex models

Test models yourself on your specific use case rather than relying only on benchmarks

Bigger models can solve problems with less data but still may not reach necessary accuracy for real applications

Quantizing and pruning can optimize large language models for efficient deployment

In-context learning and prompting can often avoid the need for expensive model training

Always consider the full product life cycle costs not just initial training costs

Build composable stacks using large language models as one component rather than a single end-to-end solution

Design solutions focused on humans, explainability, and guarding against hallucinations

Thin wrappers around existing services provide little competitive advantage

Combine large language models creatively with other AI solutions as agents

These models are powerful but be aware of their limitations in hallucination and deployment

Use pragmatically to build solutions that provide real value

Transcripts

00:00

hello everyone

00:02

um I'm Amir and

00:03

it's fantastic to see some some familiar

00:07

faces after after years I I just saw

00:12

um some some some very good friends here

00:14

I'm excited to share uh this with you

00:17

and this is the right time so what I'm

00:20

going to to talk about today is

00:24

um

00:25

large language models are they real uh

00:29

are they hyped and uh how how we can use

00:33

them what should we pay attention to if

00:36

we want to build something useful with

00:38

them

00:39

and I'm going to be very pragmatic about

00:42

it and I'll tell you what what exactly I

00:44

mean by pragmatic

00:47

um so

00:49

I'm heading up data science and Ai

00:52

adarteria and what we do is we are using

00:57

similar techniques and similar science

01:00

and engineering that I'm showing you

01:02

here to a make documentation Easy by

01:07

making any unstructured data

01:11

understandable searchable and con and we

01:14

convert unstructured data to

01:17

knowledge graphs basically

01:20

and then we take that and we enable the

01:23

users to to to use it for making

01:27

recommendations for reasoning on it and

01:31

and using it to to create documents and

01:34

to

01:35

um to do difficult things that are

01:37

difficult for humans and they don't need

01:39

to be difficult for humans in 2023.

01:42

um so

01:46

um large language models if you check

01:48

Wikipedia you see that a lot of language

01:53

models are basically

01:55

um a very wide range of models what the

01:58

community is called calling large

02:00

language models today

02:03

is a subsetter that and it is uh

02:07

basically a deep learning based uh model

02:10

for language

02:13

um that has many many parameters

02:16

millions or billions or even more

02:18

parameters and they are first

02:21

pre-trained on a huge Corpus of

02:24

unlabeled text so because we are going

02:27

to come back to this for for uh

02:29

practical reasons in in a few slides let

02:32

me walk you through the process

02:35

um this is the first step in in in

02:38

creating these models this is where you

02:41

take a huge model a huge language model

02:44

and you train it on a massive Corpus of

02:47

unlabeled text in an unsupervised way

02:50

then

02:52

and and and this is a generative model

02:54

when we say large language models today

02:56

in almost all the talks these are

02:59

generative models so they learn to

03:01

predict the next word and then so you

03:04

pre-train it then you teach them to use

03:07

to follow instructions and then you

03:12

align them with with your values

03:15

and and then this this is what you use

03:19

and and you can train them to follow

03:21

instructions or you can train them to be

03:23

chat dialogues and uh and and things

03:26

like that and then

03:28

when you use it when we talk about

03:31

further fine-tuning this is when you

03:34

take all of this and then you uh

03:37

fine-tune it on a

03:39

smaller data set in a supervised way to

03:43

to make it useful for your domain

03:46

your purpose and build something that

03:48

you want better

03:50

um

03:51

there's a lot of

03:53

um chat going on in the community and

03:56

there are two camps some people think

03:57

that this is this is a total height it's

04:00

very expensive you should avoid it

04:02

because it hallucinates it it makes

04:05

things up and uh and on the other side

04:07

there are people who believe that this

04:10

is it we have the final solution and NLP

04:14

is done machines are going to take over

04:17

and and we should just sit there and use

04:20

them and and uh and that's it and

04:22

there's going to be no

04:24

no scientists in the future there's

04:27

going to be no no programmers in the

04:29

future but what is our approach

04:32

um what I believe is that we are

04:36

somewhere in the middle uh these things

04:39

are pretty real you can do you can use

04:42

them for very useful applications and

04:45

you will see uh those applications in

04:48

the talks that we have today and and

04:51

they are very powerful but we are in an

04:54

active area of research and uh if you're

04:58

not familiar with research and how how

05:01

research is done and how to deal with uh

05:04

with imperfection and how to deal with

05:07

the uncertainty that is inherent to uh

05:11

to doing research

05:13

then you would expose yourself to a huge

05:16

risk because you read a paper before

05:18

it's peer-reviewed and you think this is

05:21

the thing and you pick it and um and and

05:23

then you expose yourself to uh massive

05:26

failures

05:27

our approach is going to be totally

05:31

pragmatic we say all right these we know

05:34

they are imperfect but can we build

05:36

something useful with them and how and

05:38

what should we pay attention to if uh if

05:42

we want to use them

05:44

um and this is not about research

05:46

directions

05:48

if you want to do research you should uh

05:50

completely

05:52

um go in an orthogonal Direction here

05:54

and take risk and and that that is a

05:56

completely different story and that's

05:58

not what we're going to talk about all

06:00

right so let's first look at some some

06:03

amazing success stories of of these

06:06

large language models

06:09

uh you can use them as

06:13

um as writing eight they can be your

06:16

assistance in writing they will help you

06:20

write correctly but they will not

06:24

help you write accurately

06:26

and to know more about it see amir's

06:31

heart at noon today he will walk you

06:34

through using it to to self write a book

06:38

a beautiful uh thing and I think this is

06:41

one of the main applications of these

06:43

tools and

06:44

um I personally am using them a lot for

06:47

uh for this purpose

06:50

another one is you you can use them as

06:54

your coding co-pilot they can generate

06:57

code and they're very good at it and the

07:00

reason is

07:02

programs are structured and um and and

07:06

you can learn it and they are well

07:08

formatted and you can build uh tools

07:11

that help you write better code much

07:14

faster

07:15

again this is a co-pilot it's not

07:20

um it's not an autopilot and if you use

07:23

it correctly for those who have been

07:25

using it correctly this enables you to

07:28

do a lot in a in a much shorter time

07:32

correct

07:35

uh this one if you go to to Aladdin's

07:40

talk uh next this is the next talk

07:43

coming he will walk you through

07:46

um all of it and we'll show you how to

07:48

use agents and

07:49

um how how you can use large language

07:52

models to code and uh and and he will

07:55

talk about a lot more as well

07:58

then

07:59

you

08:01

can use them this is kind of a future

08:04

but it's actually today as well you can

08:08

use them to change the way you use your

08:12

data and you talk to your data you can

08:14

actually talk to your data you can use

08:17

your own language

08:19

um for for programming

08:22

and

08:24

um for example databricks release their

08:26

Pi spark AI package recently and uh and

08:30

you can see that you can say what you

08:32

want to do with your database and and

08:36

then it goes and generates code and and

08:38

creates agents and it it use it does for

08:42

you what's what you need to do

08:44

um another one is pandas AI where you

08:47

can you can use it to talk to your

08:50

pandas in Python uh very interesting and

08:53

and Gabriel is going to to talk about

08:56

this uh later today

09:00

another very interesting application is

09:04

you can chat with your data with your

09:09

unstructured data in whatever structured

09:12

or unstructured in whatever format they

09:14

are and this is a very interesting

09:17

application

09:19

um you can apply to PDFs to YouTube

09:23

videos to audio to name it whatever you

09:28

can imagine your data to be in you can

09:31

you can use it to talk to it and for

09:33

example here's an example of

09:35

um asking

09:36

uh what is what is important about this

09:40

paper and uh and you get the result and

09:44

um Dennis is going to talk more about

09:46

this aspect of stuff today

09:49

and you will learn a lot more

09:51

interesting aspects of this

09:53

now so that's fantastic right it it

09:56

looks like we have managed to build

09:59

something that is completely automated

10:02

with very little effort what I showed

10:04

you you can you can do with very little

10:07

effort if you use the right model and

10:09

the direct framework and it looks like

10:12

if you have seen the modern times it

10:15

looks like the beginning of the scene

10:17

where they introduce the machine

10:20

completely automated but then if you

10:23

have been reading the news you see

10:25

things like this that

10:28

meta releases Galactica and then

10:32

it has to uh take it down after only

10:36

three days

10:37

well

10:39

then then Google releases Bard and then

10:42

the the shares fall

10:46

and more recently lawyers use chat GPT

10:50

and then

10:53

um interesting stuff happens and and

10:55

they they have to defend themselves

10:58

um and now

11:01

so suddenly it looks like the thing the

11:04

machine that we thought was working

11:06

perfectly is uh is not perfect and the

11:10

question is

11:12

what happened and uh what is what is

11:16

going on with this

11:19

I want to talk about this the first

11:21

thing and then

11:22

um hopefully everyone uh will uh we will

11:26

agree quickly and and then we can move

11:28

on to the to the next

11:30

um stuff

11:31

what happened here is called

11:33

hallucination and apparently there is a

11:36

better word for it uh called fabulation

11:38

which is a more psychological term

11:41

um these models are prone to making

11:44

things up

11:46

and because these are Auto regressive

11:48

models they

11:50

they generate

11:54

sequences that sound per that sound like

11:57

perfect English but there is but they

12:00

are factually wrong

12:02

um this is my experiment I asked Chad

12:05

GPT whether Einstein could have seen the

12:08

could have heard the news about landing

12:10

on the moon and when you read it in the

12:13

first read it it sounds perfect but then

12:16

you are like wait a second what and then

12:19

you you read it again and you see that

12:21

this is actually

12:23

um factually wrong

12:26

well

12:27

maybe it's because they don't have

12:30

access to to the internet what if we

12:33

give them access to the internet so I I

12:36

used you.com

12:38

and I asked about myself the model knows

12:41

about me iranian-american astrophysicist

12:44

okay close that's that's good work done

12:47

CMB but gets everything else wrong so

12:51

where I worked and what I did and it

12:54

thinks that I I want a grouper prize in

12:56

cosmology in 2018 I wish it was true but

13:00

unfortunately it's not so

13:03

um these

13:05

might have fixed by now if you if you

13:08

try it again it may not give you the

13:10

same answers but that's not the point

13:12

what I want to show you is that these

13:14

models can make things up and this is a

13:17

known fact and it's because of the

13:19

nature of these models and we should be

13:20

aware of it and when we build something

13:22

we should take that into account

13:26

all right so

13:28

the first rule is if you're going to use

13:30

it make sure that you minimize your uh

13:34

exposure to the impact of hallucination

13:38

and there are ways of doing that this is

13:40

an area of research is very active and

13:43

there are tools and ways of doing it we

13:45

don't have time to go into it but you

13:48

should be aware of that

13:51

if you compare it with self-driving cars

13:54

uh we are around here so we have things

14:00

that work and they are very powerful and

14:03

very promising and you can you can

14:05

actually use them but you cannot put it

14:08

on autopilot completely or they will

14:11

drive you into a plane and and they will

14:14

you know you you will be

14:17

um damaged

14:18

and um the the right way of using them

14:22

is to make sure that you put God razor

14:25

on them and you put humans somewhere or

14:28

you put fact checking or again uh other

14:32

other ways but we're around here and

14:34

it's very promising

14:38

all right now instead of talking about

14:42

rules and do this and uh you know you're

14:45

missing out if you're not doing these 15

14:47

things and uh uh useless conversations

14:50

like that let's build a framework of

14:53

thinking that would help us uh think

14:57

about large language models and and

14:59

figure out what to do when and how

15:02

foreign

15:05

so the first question is do I really

15:09

need an a large language model to solve

15:12

my problem

15:14

um well the first thing to to try and to

15:18

be very honest with yourself about is

15:22

a large language model actually solving

15:25

my problem and by solving we mean what

15:30

is what is your desired metric

15:33

take it and run it on your task

15:37

experiment with it this is a machine

15:39

learning solution and we know how we how

15:43

how how to use machine learning

15:45

Solutions

15:47

so take it and um and and do an honest

15:51

statistical

15:53

um benchmarking on on your use case and

15:56

see if it if it solves your problem

15:59

and uh when you do it when I say use it

16:04

I mean take it use it zero shot orbit in

16:08

context learning and then if you have to

16:11

do fine tuning then then think about it

16:13

and and ask yourself do I really have to

16:16

do fine tuning or can I do better

16:18

prompting and in context learning to

16:21

um to to get the best out of what I have

16:24

the second one is okay if it solves my

16:27

problem can I really take it to

16:29

production

16:30

and we will talk about taking things to

16:33

production because these are massive

16:34

beasts these are these are huge and

16:37

there are considerations when you think

16:39

about taking these to production then

16:41

think about how flexible you are how

16:44

much money you have okay and how much

16:46

money you're willing to spend on the

16:49

cost of inference let's say you're

16:50

you're not training or fine-tuning but

16:53

the inference cost could be very high

16:55

and and you should be aware of it and

16:57

you should

16:58

you should be you should have a good

17:00

idea about what you want to spend on and

17:03

uh how flexible are you about latency

17:07

are you are you going to use them for

17:10

sub Second Use cases well that's a

17:13

different story or can you if you're if

17:16

you're writing a book with it you will

17:18

be fine waiting for even five minutes

17:21

for it to to to generate stuff for you

17:25

um and how bad would it be if it starts

17:30

making things up do you have their guard

17:33

rails in place or are you exposing

17:36

yourself to huge uh risk of failure and

17:42

um now how to answer these questions is

17:45

actually very easy we are in 2023 thanks

17:48

to hugging face

17:50

you have access to a lot of these large

17:53

language models the open source models

17:55

are available to you you can uh you can

17:58

go there call them use their uh use them

18:02

to to solve your problem and experiment

18:04

with them

18:06

um and you don't need anything else

18:07

basically you just you just need to know

18:09

how to how to load them and and use them

18:12

for uh zero shot or in context learning

18:15

uh something very useful that I

18:18

recommend to everyone is GPT for all

18:21

this is a tool that

18:24

um I have it on my machine I use it a

18:27

lot and and also you can keep install it

18:30

and use it for a lot of your prototyping

18:32

and experiments

18:34

and um all the Enterprise solutions they

18:38

have playgrounds they give you free

18:40

access to uh to their stuff so you can

18:44

go there and and experiment with it and

18:46

and you know quickly get to the bottom

18:48

of this question do does it does it

18:51

solve my problem yes or no and uh and

18:53

then make a decision and then also on

18:57

this side ask yourself

18:59

can I solve it with a smaller Foundation

19:02

model you know because Foundation

19:04

Foundation models are you have a huge

19:07

family of foundation models and you know

19:10

our own bed and and all of its

19:13

descendants and everything else that you

19:16

have there you have very powerful model

19:18

there and a lot of solution a lot of

19:21

problems you can actually solve with

19:23

with zero shot if you know how to how to

19:26

work with this stuff or you can build

19:28

few short Learners and and use them to

19:31

solve your problem

19:33

or you can fine tune them and you can

19:36

fine tune a much smaller models of

19:39

magnitude smaller and cheaper but then

19:41

you have to pay Upfront for fine tuning

19:44

and then you do your calculation and you

19:47

decide what is right and and what

19:49

actually works for you

19:51

um here

19:52

my recommendation is Occam's Razer and

19:56

think about the simplest solution that

20:00

is going to solve your problem remember

20:02

you have to you have to send these to

20:05

production and you have to maintain them

20:09

and

20:10

um and and picking simple goes a long

20:14

way

20:15

yeah

20:17

now how do you pick it

20:19

do should I use open source or closed

20:22

Source I look at all the benchmarks

20:26

and I get confused and and all of that

20:29

but

20:30

um what is important is again

20:33

experimenting and figuring out what

20:36

actually solves your problem I would say

20:38

ignore all those benchmarks if you can

20:40

and do a little

20:43

um statistical testing for yourself

20:45

because you know what problem you want

20:47

to solve

20:49

there are models that are

20:52

not on that leaderboard they are not

20:55

high on that leaderboard but they

20:56

actually can solve your problem well and

20:58

they are very small and and you can use

21:00

them

21:02

um how important is privacy to you and

21:04

ownership of your model can you send

21:06

your data to a third party or not do you

21:10

want your results to be reproducible and

21:12

how much you want to pay for it uh these

21:15

are important factors that would tell

21:17

you whether you can

21:20

use closed source

21:23

models or or whether you need to take

21:26

open source and make it your own and

21:29

deploy it in your environment to use it

21:31

and then if you if you decide you want

21:34

to use open source large language models

21:37

then make sure you understand the

21:39

license

21:40

uh this is a bit tricky it's open source

21:43

but it doesn't mean that you can use it

21:45

for business and uh make sure you

21:48

understand that and pay attention to the

21:51

quality of the results see what solves

21:54

your problem

21:55

pay attention to the size because size

21:57

bigger size means higher inference cost

22:00

and

22:02

um and and higher latency

22:04

is going to talk about the whole

22:07

landscape and he is going to walk us

22:10

through how to think about it and how to

22:12

pick and uh what to do there

22:16

all right so I should be using the

22:19

largest model right and I will be

22:22

missing out if I if I don't do it

22:25

actually uh site matters and there is a

22:28

very interesting paper that

22:31

um I recommend reading shows you how

22:34

size affects your your result basically

22:39

the

22:41

essence of it is bigger models or better

22:44

Learners and they can do better with

22:47

fewer data so you can they begin models

22:50

are better few shot learners

22:52

and uh that means if you use bigger

22:56

models it's more likely that you can

22:58

solve your problem with less data but

23:01

when what we're talking about here if

23:05

you look at this uh this core this is

23:07

the super glue score on

23:09

on some large language models we are

23:12

talking about 70 percent

23:15

score

23:16

if you are in this business you know

23:19

that you cannot sell 70 percent

23:23

um if your F score is 70 uh your clients

23:26

are not going to pay for it okay that's

23:29

why you put a lot of effort in to to go

23:33

that last Mile and push it and fine-tune

23:36

your models and get to the around 90

23:39

percent Mark where you can actually

23:42

compete with humans and you can do

23:45

something that that people are willing

23:46

to pay for so

23:48

don't forget that it's important

23:51

um with these models you can build

23:53

things that kind of work

23:55

and

23:57

uh and and using it in production for

24:01

real stuff is a different story

24:04

um a a very interesting uh case study

24:07

has been done by refuel and they looked

24:10

at labeling using Elements which is

24:12

basically a zero shot application the

24:14

the reason I'm showing it here is to

24:18

show you that look 25 is actually a tiny

24:22

model compared to these these massive uh

24:25

models and if you look at how it is

24:29

doing it's actually doing pretty good

24:32

and if you look at the cost per label

24:37

it's the cheapest

24:39

and uh in terms of latency it's not it's

24:43

not too bad

24:45

um so

24:46

think about it

24:48

and and don't go after the biggest most

24:51

expensive slowest ones and again asked

24:55

what actually solves your problem and

24:57

again when you think about benchmarks

24:59

take them with use them with care

25:01

because first of all uh what has a high

25:05

Benchmark on the leaderboard doesn't

25:06

necessarily mean that that's the best

25:08

thing that solves your problem and

25:11

um

25:12

evaluations for large language models

25:15

we're still learning about it we are

25:17

still there is uh there is an active

25:19

area of research doing that and also if

25:23

a model is closed and they are not

25:26

telling you what they have been you uh

25:28

what what they have used to train the

25:30

model and your benchmarks are available

25:33

on the internet

25:36

um just think about it that they the The

25:38

Benchmark data might have been used in

25:41

training of these models okay so again

25:45

test figure out for yourself

25:48

when you deploy the stuff

25:52

um

25:53

you so let's say you build it now you

25:57

have a solution you want to take it to

25:59

production so you need you need to

26:01

deploy them or if you're using a third

26:05

party then you need to be able to call

26:08

the API

26:10

um so

26:12

when to to use which one

26:16

um if you want to use the third party

26:18

you need to make sure that you are fine

26:20

with with your business is fine with

26:22

sending you data to a third party

26:24

usually that's if that's a big red line

26:27

and you stop there uh for us it's a it's

26:31

a it's a noble we can we can do it for

26:34

example at arterio

26:36

um so then what you need is to have to

26:39

build your own mlops team and then you

26:43

build a and and you optimize your

26:45

pipeline and you optimize your models uh

26:49

the second one is very important taking

26:52

a model in its raw form and deploying it

26:56

uh is is fine but it's not the best you

27:00

can do you can get much better latency

27:03

much better cost

27:05

if you do deep learning engineering on

27:08

your models and quantize them and plune

27:10

them and use diversity and

27:12

um gglm there's a lot of uh there's a

27:16

lot you can do there and there are good

27:19

um there are good ways of using and

27:22

deploying these models uh hugging face

27:25

inference databricks base 10 gives you

27:28

out of the box easy

27:31

um deployment Solutions and AWS is

27:35

coming up with chips that are specific

27:37

for for inference and and they are

27:40

um they are optimized for that and Intel

27:43

is releasing very interesting stuff

27:47

um to combine hardware and software for

27:51

for optimized inference

27:54

um RB and financial are going to talk

27:57

more about different aspects of this

27:58

later in the in the afternoon so don't

28:02

miss it

28:04

all right and here is something very

28:06

interesting that I was impressed with I

28:09

thought you would enjoy seeing as well

28:10

uh here is me uh running a large

28:14

language model on my laptop so if you

28:18

think when you when you use a large

28:20

language model you need massive gpus and

28:24

distributed and cloud and all of that

28:26

that's that's not the case you can

28:28

actually uh use them on your laptop

28:31

today and uh and that's that's huge it's

28:35

great for prototyping it's great for a

28:37

lot of experiments and for having a

28:39

little assistant next to you this is

28:43

possible thanks to ggml

28:47

all right

28:49

do you need

28:51

to train your model your own large

28:54

language model or

28:56

can you build useful stuff with the with

28:59

the existing models

29:01

if you want to train your own mother

29:03

think about it ask yourself is it really

29:07

necessary or am I thinking about

29:10

training because I'm used to doing

29:13

machine learning that way and and I

29:15

think if I want to do something of my

29:17

own I have to train my model

29:19

and the reason you should think really

29:22

hard about it is because these models

29:24

are massive and in addition to a lot of

29:28

compute a lot of GPU TPU time you also

29:31

need a lot of data to be able to build

29:34

something meaningful there so ask

29:36

yourself do I have the data do I have

29:38

the budget do I have the time do I have

29:40

the right team and do I really have do I

29:43

really need to to train my own or or can

29:46

I

29:47

use in context learning with good

29:49

prompting or maybe a little bit of

29:52

fine-tuning

29:54

and uh and and think about

29:57

when you it's not when you train it's

29:59

not a one-time thing you have to train

30:01

and retrain and retrain and uh and and

30:04

the cost uh quickly adds up and think

30:08

about what actually you you gain if you

30:11

if you do that

30:15

um

30:16

so here I wanted to say

30:19

um again most of your problems most of

30:23

the problems that I have seen you should

30:26

be able to do with uh within context

30:29

learning and um and you know using

30:32

something like ESP or or something

30:35

similar is going to

30:37

solve a lot of your problems and then

30:40

your large language model could be one

30:43

component in your whole stack we will

30:45

talk about it in a second and

30:48

if you start getting into this phase of

30:51

no I have to fine tune and I have to

30:54

collect data and I have to do it a

30:57

supervised training to to improve the

30:59

resource and get what I want then take a

31:02

step back and look at it again and ask

31:04

yourself

31:06

um is my problem best solved with a

31:08

large language model or since I'm I'm

31:11

going to fine tune can I build something

31:14

which is way smaller and and I fine-tune

31:18

it and and I I create something that is

31:22

fast and efficient and solves my problem

31:24

at a a fraction of that cost

31:30

um

31:31

and here is my uh the next two slides

31:35

are two

31:36

um

31:36

to two recommendations that I have for

31:39

you if you if you're doing this

31:42

use the power prompting use in context

31:45

learning and push it to to its limits

31:49

and

31:52

um

31:54

make sure that that you know how to do

31:57

this efficiently that solves a lot of

32:00

your problems

32:02

and always think about your your your

32:06

stack your your Hardware

32:09

imagine

32:11

and know that that you might be able to

32:13

to do a lot of things without using the

32:17

most expensive

32:19

um Hardware

32:21

and if you

32:22

if you do a little bit of deep learning

32:24

engineering on your model and if you

32:27

think about it right you my experience

32:30

is that you're always able to find a

32:32

solution that is cheaper faster and uh

32:35

and remember inference costs are

32:38

different from training costs training

32:40

cost is a one-time thing in the

32:42

beginning

32:43

and then you're good for for a while

32:46

inference cost

32:47

you're paying it every time someone

32:49

called your API and they add up quickly

32:52

and they make it expensive and and that

32:56

cost brings your margin down and then

32:59

then you will end up building something

33:00

that works and is interesting but you're

33:03

breaking the bank and you can like not

33:05

make money with it and it's not

33:06

sustainable and your idea dies and your

33:10

product doesn't go anywhere and uh be

33:13

composable build a Tiramisu at arteria

33:16

we love to build pyramid suits because

33:17

we love kiramisu

33:19

um I think the all of these different

33:23

things as layers in your Tiramisu and

33:26

use a large language model as one

33:29

component but remember that you can add

33:31

many more layers on top of it to make

33:34

something of your own

33:36

and and make a difference and

33:40

always think of it as a product as

33:43

opposed to a jupyter notebook project

33:47

and

33:49

um

33:50

design design is very important for this

33:53

stuff make sure that you design it in a

33:56

way that protects you from hallucination

33:59

that puts the human in the center that

34:01

makes it explainable makes it

34:02

understandable and if you do it right

34:05

you will build something that will make

34:08

you very rich uh very quickly

34:12

and this is

34:14

my my final thought on this

34:18

I think

34:20

if you build a thin layer which is a

34:23

wrapper around existing Services

34:26

you will be able to build something very

34:29

quickly in hours or days and and it will

34:32

be impressive but you will not have that

34:35

competitive advantage to anyone else

34:38

because remember what you did was smart

34:41

but now that you have done it it takes

34:43

others also a few hours or a few days to

34:47

catch up with you

34:48

and then then you have you're just one

34:51

voice in

34:53

um this whole Market

34:54

think more deeply about it and start

34:58

building more sophisticated things and

35:02

um and and

35:04

one interesting use of llms

35:08

which I think is the the right so the

35:11

right question to ask today is how can I

35:14

use different AI components together to

35:18

build something that actually makes a

35:20

difference

35:21

not whether I should use LL or I should

35:25

not use elements this is this is another

35:27

flavor of machine learning models

35:31

and they can be combined with other

35:34

things you can build change with them

35:36

you can you can use them as agents and

35:40

you can combine with your let's say you

35:42

have your best named entity recognition

35:45

model that that is better than everyone

35:48

else in the world

35:50

you can use uh these large language

35:53

models as the base and then you can use

35:55

them as agents and you can have them use

35:58

your tool your model your other models

36:00

as tools and then you can see that you

36:02

quickly start building uh things that

36:06

are sophisticated and are very difficult

36:08

to to beat and compete with

36:11

and

36:12

um

36:14

so that's

36:18

it basically

36:21

um these are very powerful tools

36:24

you should use them for sure

36:27

but be aware of

36:30

uh everything that that we talked about

36:32

including hallucination and deployment

36:36

considerations and pay attention to

36:39

where you can or you should or you

36:41

should not use open versus close models

36:44

and definitely use them as writing AIDS

36:47

definitely use them as your co-pilots

36:49

for writing code and for low data

36:52

regimes and

36:55

I hope you build something

36:57

exciting and I'm looking forward to

36:59

seeing everything that might happen

37:01

after this Workshop

Browse More Related Video

Introduction to large language models

AT&T Presents - Winning in a GenAI Disruptive World

Foundation models and the next era of AI

2-Langchain Series-Building Chatbot Using Paid And Open Source LLM's using Langchain And Ollama

The Fastest Way to AGI: LLMs + Tree Search – Demis Hassabis (Google DeepMind CEO)

AI and Kotlin: A Perfect Mix | Vladislav Tankov

Rate This

★

★

★

★

★

5.0 / 5 (0 votes)