LLM Starter Pack: A Pragmatic Guide to Success with the Large Language Models

ML Explained - Aggregate Intellect - AI.SCIENCE
10 Jul 202337:04

Summary

TLDRAmir provides a pragmatic view on using large language models. He explains common applications like writing assistance and coding, but cautions about risks like hallucination. He advises experimenting to see if models solve your problem before deployment, considering costs and flexibility. Composition and design are key - build a stack with language models as one component. Sophisticated combinations of models can provide competitive advantage. Use powerful tools like language models, but with awareness of limitations.

Takeaways

  • 😊 LLMs are very useful for writing assistance, coding, querying data, etc but still imperfect
  • 😵‍💫 Beware of hallucinations - LLMs can make up convincing but false information
  • 😏 Evaluate if LLM solves your problem before productionizing; consider inference cost
  • 🤔 Minimize hallucination risks; add guardrails like human review, fact checking etc
  • 🤯 LLMs enable cool things like coding co-pilots, talking to data, writing books etc
  • 😎 Experiment with public models & tools to determine if LLM meets your needs
  • 🔍 For production use, optimize model latency, cost etc with MLOps, quantization etc
  • ⚖️ Consider open vs closed source models based on needs like privacy, cost
  • 📏 Bigger LLMs learn faster but benchmarks may not reflect production readiness
  • 🛠 Be composable - use LLM as part of a stack, combine with other models

Q & A

  • What are some of the main applications and use cases presented for large language models?

    -Some of the main applications mentioned are using them as writing aids, coding co-pilots to help generate code, enabling natural language interaction with data, and using them to extract information from unstructured data sources like PDFs, videos, and audio.

  • What risks or downsides are discussed regarding large language models?

    -The main risks discussed are the tendency to hallucinate or fabricate factual information, leading to incorrect or misleading outputs. Several examples are provided of language models generating convincing but false information.

  • How can the risks of hallucination from large language models be mitigated?

    -Some ways to mitigate hallucination risks include minimizing exposure through careful prompting and design, putting guard rails in place with human oversight or fact checking components, and using large language models as just one composable component in a larger AI stack.

  • What considerations are mentioned regarding deployment of large language models?

    -Key deployment considerations cover factors like cost, latency, privacy, and flexibility needs. Additional model optimization, quantization, and hardware-software co-design can help maximize efficiency of deployed models.

  • When is training your own large language model recommended vs leveraging existing models?

    -Training your own model requires extensive data, compute budget, and specialized teams. In most cases, leveraging existing models with techniques like prompting and in-context learning can meet needs without costly training.

  • How can combining large language models with other AI capabilities lead to more advanced solutions?

    -Using large language models alongside other AI modules like specialized NER or NLP models, knowledge graphs, etc. allows creating sophisticated solutions that accentuate different strengths.

  • What framework is proposed for evaluating if and how to apply large language models to a problem?

    -The suggested framework analyzes whether large language models can actually solve the problem, if solutions could be deployed to production, flexibility needs, and risks like hallucination before deciding on best approach.

  • How crucial is model and prompt engineering highlighted in effectively applying large language models?

    -Effective prompting and model optimization techniques are emphasized as critical to maximize large language model potential while minimizing cost and latency tradeoffs.

  • What is the outlook given on the future potential and current maturity of large language model technology?

    -The technology shows great promise but is positioned as still maturing rapidly, requiring thoughtful application design and awareness of limitations in present form.

  • Why is a composable AI stack incorporating diverse technologies suggested over reliance on large language models alone?

    -Combining large language models with other specialized AI components allows accentuating different strengths to create more advanced and resilient solutions.

Outlines

00:00

😀 Introducing the speaker and topic of large language models

The paragraph introduces the speaker Amir and the topic of large language models (LLMs). It talks about familiar faces in the audience, excitement to share information on LLMs, whether they are overhyped, and how they can be used to build something useful. The talk will take a pragmatic approach.

05:02

😟 LLMs have imperfections like hallucination risk

The paragraph discusses that LLMs have risks like hallucination where they make up convincing but factually incorrect text. Research involves dealing with uncertainty and imperfection. The pragmatic approach is building something useful with LLMs while managing the risks.

10:02

😲 Examples of LLM failures and hallucination

The paragraph provides examples of public LLM failures like Meta's Galactica and Google's Bard, and lawyer's use of chatGPT. This highlights the risk of hallucination where LLMs make up convincing but false information.

15:03

🤔 Framework for deciding if and how to use LLMs

The paragraph introduces a framework for deciding if and how to use LLMs. Questions include - does the LLM solve my problem based on metrics, can it go to production, how flexible on cost and latency, and how bad would hallucinations be.

20:05

🔍 Tradeoffs between LLM size, accuracy, cost and speed

The paragraph discusses tradeoffs between LLM size, accuracy, computational cost and speed. Bigger LLMs have better few-shot learning but smaller ones can be more practical. Aim for simplicity with the smallest LLM that solves the problem.

25:07

🚀 Optimizing LLM deployment cost and latency

The paragraph covers optimizing cost and latency when deploying LLMs in production. This includes model quantization, pruning, server optimization using chips tailored for inference.

30:08

😎 LLMs now possible even on laptops

The paragraph shows exciting possibility of running LLMs even on laptops today. This enables easier prototyping and experimentation.

35:11

❔ Alternatives before deciding to train your LLM

The paragraph discusses trying in-context learning and prompting fine-tuned smaller models before deciding to train your own expensive LLM which needs massive data and compute.

Mindmap

Keywords

💡large language models

Large language models refer to a subset of deep learning models that are trained on massive amounts of text data to generate human-like language and text. As explained in the script, they are first pre-trained in an unsupervised way, then taught to follow instructions, before being fine-tuned for specific applications. Their key capability is natural language generation. The speaker discusses both their promise and limitations.

💡hallucination / fabulation

This refers to the tendency of large language models to 'make things up' or generate factually incorrect statements. As shown through examples in the script, they can provide convincing but false information. Understanding this limitation is critical when deciding whether and how to apply large language models.

💡in context learning

Rather than extensive fine-tuning, in context learning involves providing a large language model with a prompt that establishes the desired context and constraints for text generation. As noted in the script, effective prompting allows solving many problems without costly model retraining.

💡inference cost

Unlike training costs which occur once, inference costs are paid every time a deployed model is called to generate text or complete a task. The speaker cautions that the expense of running very large models can make applications economically unviable.

💡composability

This refers to combining large language models with other AI components rather than using them in isolation. Integration with task-specific models and establishing feedback loops with human input are noted as ways to create more robust applications.

💡hardware optimizations

Rather than simply running large models on GPU clusters, optimizations like quantization and pruning can improve latency and lower deployment costs significantly. Specialized AI accelerators also continue to evolve for efficient inference.

💡open versus closed source

The script advises weighing factors like model quality, privacy, license restrictions, and reproducibility when deciding between open source versus proprietary large language models.

💡design considerations

When building applications, the speaker emphasizes architectural choices that safeguard against potential model inaccuracies - such as interfaces for human-in-the-loop validation and integrating explanatory capabilities.

💡productization

Rather than prototype experiments, the script advocates developing large language model capabilities with the rigor required for real-world products - including scalable infrastructure and sustainable economics.

💡limits of utility

While noting promising capabilities, the speaker concludes that large language models may solve fewer problems than assumed. Systematic testing on specific use cases is advised rather than chasing benchmarks.

Highlights

Large language models can hallucinate and make up convincing but factually incorrect information

These models are useful as writing aids, coding assistants, and for talking to data

Put guardrails in place when using these models to minimize risk of failures from hallucinations

Use simple solutions like prompting and smaller foundation models where possible rather than large complex models

Test models yourself on your specific use case rather than relying only on benchmarks

Bigger models can solve problems with less data but still may not reach necessary accuracy for real applications

Quantizing and pruning can optimize large language models for efficient deployment

In-context learning and prompting can often avoid the need for expensive model training

Always consider the full product life cycle costs not just initial training costs

Build composable stacks using large language models as one component rather than a single end-to-end solution

Design solutions focused on humans, explainability, and guarding against hallucinations

Thin wrappers around existing services provide little competitive advantage

Combine large language models creatively with other AI solutions as agents

These models are powerful but be aware of their limitations in hallucination and deployment

Use pragmatically to build solutions that provide real value

Transcripts

play00:00

hello everyone

play00:02

um I'm Amir and

play00:03

it's fantastic to see some some familiar

play00:07

faces after after years I I just saw

play00:12

um some some some very good friends here

play00:14

I'm excited to share uh this with you

play00:17

and this is the right time so what I'm

play00:20

going to to talk about today is

play00:24

um

play00:25

large language models are they real uh

play00:29

are they hyped and uh how how we can use

play00:33

them what should we pay attention to if

play00:36

we want to build something useful with

play00:38

them

play00:39

and I'm going to be very pragmatic about

play00:42

it and I'll tell you what what exactly I

play00:44

mean by pragmatic

play00:47

um so

play00:49

I'm heading up data science and Ai

play00:52

adarteria and what we do is we are using

play00:57

similar techniques and similar science

play01:00

and engineering that I'm showing you

play01:02

here to a make documentation Easy by

play01:07

making any unstructured data

play01:11

understandable searchable and con and we

play01:14

convert unstructured data to

play01:17

knowledge graphs basically

play01:20

and then we take that and we enable the

play01:23

users to to to use it for making

play01:27

recommendations for reasoning on it and

play01:31

and using it to to create documents and

play01:34

to

play01:35

um to do difficult things that are

play01:37

difficult for humans and they don't need

play01:39

to be difficult for humans in 2023.

play01:42

um so

play01:46

um large language models if you check

play01:48

Wikipedia you see that a lot of language

play01:53

models are basically

play01:55

um a very wide range of models what the

play01:58

community is called calling large

play02:00

language models today

play02:03

is a subsetter that and it is uh

play02:07

basically a deep learning based uh model

play02:10

for language

play02:13

um that has many many parameters

play02:16

millions or billions or even more

play02:18

parameters and they are first

play02:21

pre-trained on a huge Corpus of

play02:24

unlabeled text so because we are going

play02:27

to come back to this for for uh

play02:29

practical reasons in in a few slides let

play02:32

me walk you through the process

play02:35

um this is the first step in in in

play02:38

creating these models this is where you

play02:41

take a huge model a huge language model

play02:44

and you train it on a massive Corpus of

play02:47

unlabeled text in an unsupervised way

play02:50

then

play02:52

and and and this is a generative model

play02:54

when we say large language models today

play02:56

in almost all the talks these are

play02:59

generative models so they learn to

play03:01

predict the next word and then so you

play03:04

pre-train it then you teach them to use

play03:07

to follow instructions and then you

play03:12

align them with with your values

play03:15

and and then this this is what you use

play03:19

and and you can train them to follow

play03:21

instructions or you can train them to be

play03:23

chat dialogues and uh and and things

play03:26

like that and then

play03:28

when you use it when we talk about

play03:31

further fine-tuning this is when you

play03:34

take all of this and then you uh

play03:37

fine-tune it on a

play03:39

smaller data set in a supervised way to

play03:43

to make it useful for your domain

play03:46

your purpose and build something that

play03:48

you want better

play03:50

um

play03:51

there's a lot of

play03:53

um chat going on in the community and

play03:56

there are two camps some people think

play03:57

that this is this is a total height it's

play04:00

very expensive you should avoid it

play04:02

because it hallucinates it it makes

play04:05

things up and uh and on the other side

play04:07

there are people who believe that this

play04:10

is it we have the final solution and NLP

play04:14

is done machines are going to take over

play04:17

and and we should just sit there and use

play04:20

them and and uh and that's it and

play04:22

there's going to be no

play04:24

no scientists in the future there's

play04:27

going to be no no programmers in the

play04:29

future but what is our approach

play04:32

um what I believe is that we are

play04:36

somewhere in the middle uh these things

play04:39

are pretty real you can do you can use

play04:42

them for very useful applications and

play04:45

you will see uh those applications in

play04:48

the talks that we have today and and

play04:51

they are very powerful but we are in an

play04:54

active area of research and uh if you're

play04:58

not familiar with research and how how

play05:01

research is done and how to deal with uh

play05:04

with imperfection and how to deal with

play05:07

the uncertainty that is inherent to uh

play05:11

to doing research

play05:13

then you would expose yourself to a huge

play05:16

risk because you read a paper before

play05:18

it's peer-reviewed and you think this is

play05:21

the thing and you pick it and um and and

play05:23

then you expose yourself to uh massive

play05:26

failures

play05:27

our approach is going to be totally

play05:31

pragmatic we say all right these we know

play05:34

they are imperfect but can we build

play05:36

something useful with them and how and

play05:38

what should we pay attention to if uh if

play05:42

we want to use them

play05:44

um and this is not about research

play05:46

directions

play05:48

if you want to do research you should uh

play05:50

completely

play05:52

um go in an orthogonal Direction here

play05:54

and take risk and and that that is a

play05:56

completely different story and that's

play05:58

not what we're going to talk about all

play06:00

right so let's first look at some some

play06:03

amazing success stories of of these

play06:06

large language models

play06:09

uh you can use them as

play06:13

um as writing eight they can be your

play06:16

assistance in writing they will help you

play06:20

write correctly but they will not

play06:24

help you write accurately

play06:26

and to know more about it see amir's

play06:31

heart at noon today he will walk you

play06:34

through using it to to self write a book

play06:38

a beautiful uh thing and I think this is

play06:41

one of the main applications of these

play06:43

tools and

play06:44

um I personally am using them a lot for

play06:47

uh for this purpose

play06:50

another one is you you can use them as

play06:54

your coding co-pilot they can generate

play06:57

code and they're very good at it and the

play07:00

reason is

play07:02

programs are structured and um and and

play07:06

you can learn it and they are well

play07:08

formatted and you can build uh tools

play07:11

that help you write better code much

play07:14

faster

play07:15

again this is a co-pilot it's not

play07:20

um it's not an autopilot and if you use

play07:23

it correctly for those who have been

play07:25

using it correctly this enables you to

play07:28

do a lot in a in a much shorter time

play07:32

correct

play07:35

uh this one if you go to to Aladdin's

play07:40

talk uh next this is the next talk

play07:43

coming he will walk you through

play07:46

um all of it and we'll show you how to

play07:48

use agents and

play07:49

um how how you can use large language

play07:52

models to code and uh and and he will

play07:55

talk about a lot more as well

play07:58

then

play07:59

you

play08:01

can use them this is kind of a future

play08:04

but it's actually today as well you can

play08:08

use them to change the way you use your

play08:12

data and you talk to your data you can

play08:14

actually talk to your data you can use

play08:17

your own language

play08:19

um for for programming

play08:22

and

play08:24

um for example databricks release their

play08:26

Pi spark AI package recently and uh and

play08:30

you can see that you can say what you

play08:32

want to do with your database and and

play08:36

then it goes and generates code and and

play08:38

creates agents and it it use it does for

play08:42

you what's what you need to do

play08:44

um another one is pandas AI where you

play08:47

can you can use it to talk to your

play08:50

pandas in Python uh very interesting and

play08:53

and Gabriel is going to to talk about

play08:56

this uh later today

play09:00

another very interesting application is

play09:04

you can chat with your data with your

play09:09

unstructured data in whatever structured

play09:12

or unstructured in whatever format they

play09:14

are and this is a very interesting

play09:17

application

play09:19

um you can apply to PDFs to YouTube

play09:23

videos to audio to name it whatever you

play09:28

can imagine your data to be in you can

play09:31

you can use it to talk to it and for

play09:33

example here's an example of

play09:35

um asking

play09:36

uh what is what is important about this

play09:40

paper and uh and you get the result and

play09:44

um Dennis is going to talk more about

play09:46

this aspect of stuff today

play09:49

and you will learn a lot more

play09:51

interesting aspects of this

play09:53

now so that's fantastic right it it

play09:56

looks like we have managed to build

play09:59

something that is completely automated

play10:02

with very little effort what I showed

play10:04

you you can you can do with very little

play10:07

effort if you use the right model and

play10:09

the direct framework and it looks like

play10:12

if you have seen the modern times it

play10:15

looks like the beginning of the scene

play10:17

where they introduce the machine

play10:20

completely automated but then if you

play10:23

have been reading the news you see

play10:25

things like this that

play10:28

meta releases Galactica and then

play10:32

it has to uh take it down after only

play10:36

three days

play10:37

well

play10:39

then then Google releases Bard and then

play10:42

the the shares fall

play10:46

and more recently lawyers use chat GPT

play10:50

and then

play10:53

um interesting stuff happens and and

play10:55

they they have to defend themselves

play10:58

um and now

play11:01

so suddenly it looks like the thing the

play11:04

machine that we thought was working

play11:06

perfectly is uh is not perfect and the

play11:10

question is

play11:12

what happened and uh what is what is

play11:16

going on with this

play11:19

I want to talk about this the first

play11:21

thing and then

play11:22

um hopefully everyone uh will uh we will

play11:26

agree quickly and and then we can move

play11:28

on to the to the next

play11:30

um stuff

play11:31

what happened here is called

play11:33

hallucination and apparently there is a

play11:36

better word for it uh called fabulation

play11:38

which is a more psychological term

play11:41

um these models are prone to making

play11:44

things up

play11:46

and because these are Auto regressive

play11:48

models they

play11:50

they generate

play11:54

sequences that sound per that sound like

play11:57

perfect English but there is but they

play12:00

are factually wrong

play12:02

um this is my experiment I asked Chad

play12:05

GPT whether Einstein could have seen the

play12:08

could have heard the news about landing

play12:10

on the moon and when you read it in the

play12:13

first read it it sounds perfect but then

play12:16

you are like wait a second what and then

play12:19

you you read it again and you see that

play12:21

this is actually

play12:23

um factually wrong

play12:26

well

play12:27

maybe it's because they don't have

play12:30

access to to the internet what if we

play12:33

give them access to the internet so I I

play12:36

used you.com

play12:38

and I asked about myself the model knows

play12:41

about me iranian-american astrophysicist

play12:44

okay close that's that's good work done

play12:47

CMB but gets everything else wrong so

play12:51

where I worked and what I did and it

play12:54

thinks that I I want a grouper prize in

play12:56

cosmology in 2018 I wish it was true but

play13:00

unfortunately it's not so

play13:03

um these

play13:05

might have fixed by now if you if you

play13:08

try it again it may not give you the

play13:10

same answers but that's not the point

play13:12

what I want to show you is that these

play13:14

models can make things up and this is a

play13:17

known fact and it's because of the

play13:19

nature of these models and we should be

play13:20

aware of it and when we build something

play13:22

we should take that into account

play13:26

all right so

play13:28

the first rule is if you're going to use

play13:30

it make sure that you minimize your uh

play13:34

exposure to the impact of hallucination

play13:38

and there are ways of doing that this is

play13:40

an area of research is very active and

play13:43

there are tools and ways of doing it we

play13:45

don't have time to go into it but you

play13:48

should be aware of that

play13:51

if you compare it with self-driving cars

play13:54

uh we are around here so we have things

play14:00

that work and they are very powerful and

play14:03

very promising and you can you can

play14:05

actually use them but you cannot put it

play14:08

on autopilot completely or they will

play14:11

drive you into a plane and and they will

play14:14

you know you you will be

play14:17

um damaged

play14:18

and um the the right way of using them

play14:22

is to make sure that you put God razor

play14:25

on them and you put humans somewhere or

play14:28

you put fact checking or again uh other

play14:32

other ways but we're around here and

play14:34

it's very promising

play14:38

all right now instead of talking about

play14:42

rules and do this and uh you know you're

play14:45

missing out if you're not doing these 15

play14:47

things and uh uh useless conversations

play14:50

like that let's build a framework of

play14:53

thinking that would help us uh think

play14:57

about large language models and and

play14:59

figure out what to do when and how

play15:02

foreign

play15:05

so the first question is do I really

play15:09

need an a large language model to solve

play15:12

my problem

play15:14

um well the first thing to to try and to

play15:18

be very honest with yourself about is

play15:22

a large language model actually solving

play15:25

my problem and by solving we mean what

play15:30

is what is your desired metric

play15:33

take it and run it on your task

play15:37

experiment with it this is a machine

play15:39

learning solution and we know how we how

play15:43

how how to use machine learning

play15:45

Solutions

play15:47

so take it and um and and do an honest

play15:51

statistical

play15:53

um benchmarking on on your use case and

play15:56

see if it if it solves your problem

play15:59

and uh when you do it when I say use it

play16:04

I mean take it use it zero shot orbit in

play16:08

context learning and then if you have to

play16:11

do fine tuning then then think about it

play16:13

and and ask yourself do I really have to

play16:16

do fine tuning or can I do better

play16:18

prompting and in context learning to

play16:21

um to to get the best out of what I have

play16:24

the second one is okay if it solves my

play16:27

problem can I really take it to

play16:29

production

play16:30

and we will talk about taking things to

play16:33

production because these are massive

play16:34

beasts these are these are huge and

play16:37

there are considerations when you think

play16:39

about taking these to production then

play16:41

think about how flexible you are how

play16:44

much money you have okay and how much

play16:46

money you're willing to spend on the

play16:49

cost of inference let's say you're

play16:50

you're not training or fine-tuning but

play16:53

the inference cost could be very high

play16:55

and and you should be aware of it and

play16:57

you should

play16:58

you should be you should have a good

play17:00

idea about what you want to spend on and

play17:03

uh how flexible are you about latency

play17:07

are you are you going to use them for

play17:10

sub Second Use cases well that's a

play17:13

different story or can you if you're if

play17:16

you're writing a book with it you will

play17:18

be fine waiting for even five minutes

play17:21

for it to to to generate stuff for you

play17:25

um and how bad would it be if it starts

play17:30

making things up do you have their guard

play17:33

rails in place or are you exposing

play17:36

yourself to huge uh risk of failure and

play17:42

um now how to answer these questions is

play17:45

actually very easy we are in 2023 thanks

play17:48

to hugging face

play17:50

you have access to a lot of these large

play17:53

language models the open source models

play17:55

are available to you you can uh you can

play17:58

go there call them use their uh use them

play18:02

to to solve your problem and experiment

play18:04

with them

play18:06

um and you don't need anything else

play18:07

basically you just you just need to know

play18:09

how to how to load them and and use them

play18:12

for uh zero shot or in context learning

play18:15

uh something very useful that I

play18:18

recommend to everyone is GPT for all

play18:21

this is a tool that

play18:24

um I have it on my machine I use it a

play18:27

lot and and also you can keep install it

play18:30

and use it for a lot of your prototyping

play18:32

and experiments

play18:34

and um all the Enterprise solutions they

play18:38

have playgrounds they give you free

play18:40

access to uh to their stuff so you can

play18:44

go there and and experiment with it and

play18:46

and you know quickly get to the bottom

play18:48

of this question do does it does it

play18:51

solve my problem yes or no and uh and

play18:53

then make a decision and then also on

play18:57

this side ask yourself

play18:59

can I solve it with a smaller Foundation

play19:02

model you know because Foundation

play19:04

Foundation models are you have a huge

play19:07

family of foundation models and you know

play19:10

our own bed and and all of its

play19:13

descendants and everything else that you

play19:16

have there you have very powerful model

play19:18

there and a lot of solution a lot of

play19:21

problems you can actually solve with

play19:23

with zero shot if you know how to how to

play19:26

work with this stuff or you can build

play19:28

few short Learners and and use them to

play19:31

solve your problem

play19:33

or you can fine tune them and you can

play19:36

fine tune a much smaller models of

play19:39

magnitude smaller and cheaper but then

play19:41

you have to pay Upfront for fine tuning

play19:44

and then you do your calculation and you

play19:47

decide what is right and and what

play19:49

actually works for you

play19:51

um here

play19:52

my recommendation is Occam's Razer and

play19:56

think about the simplest solution that

play20:00

is going to solve your problem remember

play20:02

you have to you have to send these to

play20:05

production and you have to maintain them

play20:09

and

play20:10

um and and picking simple goes a long

play20:14

way

play20:15

yeah

play20:17

now how do you pick it

play20:19

do should I use open source or closed

play20:22

Source I look at all the benchmarks

play20:26

and I get confused and and all of that

play20:29

but

play20:30

um what is important is again

play20:33

experimenting and figuring out what

play20:36

actually solves your problem I would say

play20:38

ignore all those benchmarks if you can

play20:40

and do a little

play20:43

um statistical testing for yourself

play20:45

because you know what problem you want

play20:47

to solve

play20:49

there are models that are

play20:52

not on that leaderboard they are not

play20:55

high on that leaderboard but they

play20:56

actually can solve your problem well and

play20:58

they are very small and and you can use

play21:00

them

play21:02

um how important is privacy to you and

play21:04

ownership of your model can you send

play21:06

your data to a third party or not do you

play21:10

want your results to be reproducible and

play21:12

how much you want to pay for it uh these

play21:15

are important factors that would tell

play21:17

you whether you can

play21:20

use closed source

play21:23

models or or whether you need to take

play21:26

open source and make it your own and

play21:29

deploy it in your environment to use it

play21:31

and then if you if you decide you want

play21:34

to use open source large language models

play21:37

then make sure you understand the

play21:39

license

play21:40

uh this is a bit tricky it's open source

play21:43

but it doesn't mean that you can use it

play21:45

for business and uh make sure you

play21:48

understand that and pay attention to the

play21:51

quality of the results see what solves

play21:54

your problem

play21:55

pay attention to the size because size

play21:57

bigger size means higher inference cost

play22:00

and

play22:02

um and and higher latency

play22:04

is going to talk about the whole

play22:07

landscape and he is going to walk us

play22:10

through how to think about it and how to

play22:12

pick and uh what to do there

play22:16

all right so I should be using the

play22:19

largest model right and I will be

play22:22

missing out if I if I don't do it

play22:25

actually uh site matters and there is a

play22:28

very interesting paper that

play22:31

um I recommend reading shows you how

play22:34

size affects your your result basically

play22:39

the

play22:41

essence of it is bigger models or better

play22:44

Learners and they can do better with

play22:47

fewer data so you can they begin models

play22:50

are better few shot learners

play22:52

and uh that means if you use bigger

play22:56

models it's more likely that you can

play22:58

solve your problem with less data but

play23:01

when what we're talking about here if

play23:05

you look at this uh this core this is

play23:07

the super glue score on

play23:09

on some large language models we are

play23:12

talking about 70 percent

play23:15

score

play23:16

if you are in this business you know

play23:19

that you cannot sell 70 percent

play23:23

um if your F score is 70 uh your clients

play23:26

are not going to pay for it okay that's

play23:29

why you put a lot of effort in to to go

play23:33

that last Mile and push it and fine-tune

play23:36

your models and get to the around 90

play23:39

percent Mark where you can actually

play23:42

compete with humans and you can do

play23:45

something that that people are willing

play23:46

to pay for so

play23:48

don't forget that it's important

play23:51

um with these models you can build

play23:53

things that kind of work

play23:55

and

play23:57

uh and and using it in production for

play24:01

real stuff is a different story

play24:04

um a a very interesting uh case study

play24:07

has been done by refuel and they looked

play24:10

at labeling using Elements which is

play24:12

basically a zero shot application the

play24:14

the reason I'm showing it here is to

play24:18

show you that look 25 is actually a tiny

play24:22

model compared to these these massive uh

play24:25

models and if you look at how it is

play24:29

doing it's actually doing pretty good

play24:32

and if you look at the cost per label

play24:37

it's the cheapest

play24:39

and uh in terms of latency it's not it's

play24:43

not too bad

play24:45

um so

play24:46

think about it

play24:48

and and don't go after the biggest most

play24:51

expensive slowest ones and again asked

play24:55

what actually solves your problem and

play24:57

again when you think about benchmarks

play24:59

take them with use them with care

play25:01

because first of all uh what has a high

play25:05

Benchmark on the leaderboard doesn't

play25:06

necessarily mean that that's the best

play25:08

thing that solves your problem and

play25:11

um

play25:12

evaluations for large language models

play25:15

we're still learning about it we are

play25:17

still there is uh there is an active

play25:19

area of research doing that and also if

play25:23

a model is closed and they are not

play25:26

telling you what they have been you uh

play25:28

what what they have used to train the

play25:30

model and your benchmarks are available

play25:33

on the internet

play25:36

um just think about it that they the The

play25:38

Benchmark data might have been used in

play25:41

training of these models okay so again

play25:45

test figure out for yourself

play25:48

when you deploy the stuff

play25:52

um

play25:53

you so let's say you build it now you

play25:57

have a solution you want to take it to

play25:59

production so you need you need to

play26:01

deploy them or if you're using a third

play26:05

party then you need to be able to call

play26:08

the API

play26:10

um so

play26:12

when to to use which one

play26:16

um if you want to use the third party

play26:18

you need to make sure that you are fine

play26:20

with with your business is fine with

play26:22

sending you data to a third party

play26:24

usually that's if that's a big red line

play26:27

and you stop there uh for us it's a it's

play26:31

a it's a noble we can we can do it for

play26:34

example at arterio

play26:36

um so then what you need is to have to

play26:39

build your own mlops team and then you

play26:43

build a and and you optimize your

play26:45

pipeline and you optimize your models uh

play26:49

the second one is very important taking

play26:52

a model in its raw form and deploying it

play26:56

uh is is fine but it's not the best you

play27:00

can do you can get much better latency

play27:03

much better cost

play27:05

if you do deep learning engineering on

play27:08

your models and quantize them and plune

play27:10

them and use diversity and

play27:12

um gglm there's a lot of uh there's a

play27:16

lot you can do there and there are good

play27:19

um there are good ways of using and

play27:22

deploying these models uh hugging face

play27:25

inference databricks base 10 gives you

play27:28

out of the box easy

play27:31

um deployment Solutions and AWS is

play27:35

coming up with chips that are specific

play27:37

for for inference and and they are

play27:40

um they are optimized for that and Intel

play27:43

is releasing very interesting stuff

play27:47

um to combine hardware and software for

play27:51

for optimized inference

play27:54

um RB and financial are going to talk

play27:57

more about different aspects of this

play27:58

later in the in the afternoon so don't

play28:02

miss it

play28:04

all right and here is something very

play28:06

interesting that I was impressed with I

play28:09

thought you would enjoy seeing as well

play28:10

uh here is me uh running a large

play28:14

language model on my laptop so if you

play28:18

think when you when you use a large

play28:20

language model you need massive gpus and

play28:24

distributed and cloud and all of that

play28:26

that's that's not the case you can

play28:28

actually uh use them on your laptop

play28:31

today and uh and that's that's huge it's

play28:35

great for prototyping it's great for a

play28:37

lot of experiments and for having a

play28:39

little assistant next to you this is

play28:43

possible thanks to ggml

play28:47

all right

play28:49

do you need

play28:51

to train your model your own large

play28:54

language model or

play28:56

can you build useful stuff with the with

play28:59

the existing models

play29:01

if you want to train your own mother

play29:03

think about it ask yourself is it really

play29:07

necessary or am I thinking about

play29:10

training because I'm used to doing

play29:13

machine learning that way and and I

play29:15

think if I want to do something of my

play29:17

own I have to train my model

play29:19

and the reason you should think really

play29:22

hard about it is because these models

play29:24

are massive and in addition to a lot of

play29:28

compute a lot of GPU TPU time you also

play29:31

need a lot of data to be able to build

play29:34

something meaningful there so ask

play29:36

yourself do I have the data do I have

play29:38

the budget do I have the time do I have

play29:40

the right team and do I really have do I

play29:43

really need to to train my own or or can

play29:46

I

play29:47

use in context learning with good

play29:49

prompting or maybe a little bit of

play29:52

fine-tuning

play29:54

and uh and and think about

play29:57

when you it's not when you train it's

play29:59

not a one-time thing you have to train

play30:01

and retrain and retrain and uh and and

play30:04

the cost uh quickly adds up and think

play30:08

about what actually you you gain if you

play30:11

if you do that

play30:15

um

play30:16

so here I wanted to say

play30:19

um again most of your problems most of

play30:23

the problems that I have seen you should

play30:26

be able to do with uh within context

play30:29

learning and um and you know using

play30:32

something like ESP or or something

play30:35

similar is going to

play30:37

solve a lot of your problems and then

play30:40

your large language model could be one

play30:43

component in your whole stack we will

play30:45

talk about it in a second and

play30:48

if you start getting into this phase of

play30:51

no I have to fine tune and I have to

play30:54

collect data and I have to do it a

play30:57

supervised training to to improve the

play30:59

resource and get what I want then take a

play31:02

step back and look at it again and ask

play31:04

yourself

play31:06

um is my problem best solved with a

play31:08

large language model or since I'm I'm

play31:11

going to fine tune can I build something

play31:14

which is way smaller and and I fine-tune

play31:18

it and and I I create something that is

play31:22

fast and efficient and solves my problem

play31:24

at a a fraction of that cost

play31:30

um

play31:31

and here is my uh the next two slides

play31:35

are two

play31:36

um

play31:36

to two recommendations that I have for

play31:39

you if you if you're doing this

play31:42

use the power prompting use in context

play31:45

learning and push it to to its limits

play31:49

and

play31:52

um

play31:54

make sure that that you know how to do

play31:57

this efficiently that solves a lot of

play32:00

your problems

play32:02

and always think about your your your

play32:06

stack your your Hardware

play32:09

imagine

play32:11

and know that that you might be able to

play32:13

to do a lot of things without using the

play32:17

most expensive

play32:19

um Hardware

play32:21

and if you

play32:22

if you do a little bit of deep learning

play32:24

engineering on your model and if you

play32:27

think about it right you my experience

play32:30

is that you're always able to find a

play32:32

solution that is cheaper faster and uh

play32:35

and remember inference costs are

play32:38

different from training costs training

play32:40

cost is a one-time thing in the

play32:42

beginning

play32:43

and then you're good for for a while

play32:46

inference cost

play32:47

you're paying it every time someone

play32:49

called your API and they add up quickly

play32:52

and they make it expensive and and that

play32:56

cost brings your margin down and then

play32:59

then you will end up building something

play33:00

that works and is interesting but you're

play33:03

breaking the bank and you can like not

play33:05

make money with it and it's not

play33:06

sustainable and your idea dies and your

play33:10

product doesn't go anywhere and uh be

play33:13

composable build a Tiramisu at arteria

play33:16

we love to build pyramid suits because

play33:17

we love kiramisu

play33:19

um I think the all of these different

play33:23

things as layers in your Tiramisu and

play33:26

use a large language model as one

play33:29

component but remember that you can add

play33:31

many more layers on top of it to make

play33:34

something of your own

play33:36

and and make a difference and

play33:40

always think of it as a product as

play33:43

opposed to a jupyter notebook project

play33:47

and

play33:49

um

play33:50

design design is very important for this

play33:53

stuff make sure that you design it in a

play33:56

way that protects you from hallucination

play33:59

that puts the human in the center that

play34:01

makes it explainable makes it

play34:02

understandable and if you do it right

play34:05

you will build something that will make

play34:08

you very rich uh very quickly

play34:12

and this is

play34:14

my my final thought on this

play34:18

I think

play34:20

if you build a thin layer which is a

play34:23

wrapper around existing Services

play34:26

you will be able to build something very

play34:29

quickly in hours or days and and it will

play34:32

be impressive but you will not have that

play34:35

competitive advantage to anyone else

play34:38

because remember what you did was smart

play34:41

but now that you have done it it takes

play34:43

others also a few hours or a few days to

play34:47

catch up with you

play34:48

and then then you have you're just one

play34:51

voice in

play34:53

um this whole Market

play34:54

think more deeply about it and start

play34:58

building more sophisticated things and

play35:02

um and and

play35:04

one interesting use of llms

play35:08

which I think is the the right so the

play35:11

right question to ask today is how can I

play35:14

use different AI components together to

play35:18

build something that actually makes a

play35:20

difference

play35:21

not whether I should use LL or I should

play35:25

not use elements this is this is another

play35:27

flavor of machine learning models

play35:31

and they can be combined with other

play35:34

things you can build change with them

play35:36

you can you can use them as agents and

play35:40

you can combine with your let's say you

play35:42

have your best named entity recognition

play35:45

model that that is better than everyone

play35:48

else in the world

play35:50

you can use uh these large language

play35:53

models as the base and then you can use

play35:55

them as agents and you can have them use

play35:58

your tool your model your other models

play36:00

as tools and then you can see that you

play36:02

quickly start building uh things that

play36:06

are sophisticated and are very difficult

play36:08

to to beat and compete with

play36:11

and

play36:12

um

play36:14

so that's

play36:18

it basically

play36:21

um these are very powerful tools

play36:24

you should use them for sure

play36:27

but be aware of

play36:30

uh everything that that we talked about

play36:32

including hallucination and deployment

play36:36

considerations and pay attention to

play36:39

where you can or you should or you

play36:41

should not use open versus close models

play36:44

and definitely use them as writing AIDS

play36:47

definitely use them as your co-pilots

play36:49

for writing code and for low data

play36:52

regimes and

play36:55

I hope you build something

play36:57

exciting and I'm looking forward to

play36:59

seeing everything that might happen

play37:01

after this Workshop