OpenAI's NEW Embedding Models
Summary
TLDROpenAI released two new text embedding models, text-embedding-3-small and text-embedding-3-large, showing decent improvements in English language embeddings and massive gains in multilingual embeddings quality. The models have the same context window as previous versions but are trained on more recent data. Most impressively, the large 372-dimensional model can supposedly be reduced to 256 dimensions while still outperforming the previous 536-dimensional model. The new models were tested by indexing sample text and querying the vectors, with the large model returning more relevant results.
Takeaways
- 😲 OpenAI released two new text embedding models - text-embedding-3-small and text-embedding-3-large
- 📈 The new models show decent improvements in English language embedding quality
- 🚀 Massive improvements in multilingual embedding quality - from 31.4 to 54.9 on MIRACL benchmark
- ⏪ The models use data cutoff from Sept 2021, so may not perform as well on recent events
- 🔢 Text-embedding-3-large has higher dimensionality for better meaning compression
- 🤯 Can compress text-embedding-3-large to 256 dims and outperform 002 model with 512 dims
- 🐢 Text-embedding-3-large is slower for embedding than previous models
- 🔎 Compared retrieval results across models - text-embedding-3-large performed best overall
- 🤔 Hard to see big performance differences between models in this test
- 👍 New models correlate with claimed performance gains, exciting to test 256 dim version
Q & A
What were the two new embedding models released by OpenAI?
-OpenAI released Text Embedding 3 Small and Text Embedding 3 Large.
What benchmark showed massive improvements with the new models?
-The models showed massive improvements on the multilingual embeddings benchmark MIRACL.
What was the knowledge cut-off date for the new models?
-The knowledge cut-off date is still September 2021.
What is the benefit of reducing the number of dimensions in embedding vectors?
-Reducing the number of dimensions leads to reduced quality embeddings, but allows for improved compression of meaning into the vectors.
How many dimensions does the Text Embedding 3 Large model use?
-The Text Embedding 3 Large model uses 372 dimensions.
What did OpenAI claim about reducing dimensions in the large model?
-OpenAI claimed the large model could be reduced to 256 dimensions and still outperform the previous model with 536 dimensions.
Which model showed the slowest embedding speed?
-The Text Embedding 3 Large model was the slowest, taking about 24 minutes to embed the entire dataset.
What datatype does the output need to be in?
-The output needs to be in JSON format.
What was the hardest sample question that none of the models answered well?
-The question "Keep talking about red teaming for LLama 2" was too difficult and none of the models provided good answers.
Which model provided the most accurate results in the comparison?
-The Text Embedding 3 Large model provided the most accurate results in the GPT-4 vs LLama comparison.
Outlines
😲 New OpenAI text embedding models released
Paragraph 1 introduces the new text embedding models released by OpenAI in March 2024 - text embedding 3 small and text embedding 3 large. It discusses the impressive performance improvements shown on English and multilingual benchmarks, especially for the large model on the multilingual benchmark (miracle). However, the models use old data cutoffs from Sept 2021 and do not increase max context window size.
👨💻 Using the new OpenAI text embedding models
Paragraph 2 walks through sample code to get API keys, initialize connections, and create Pinecone indexes to test the new OpenAI text embedding models (as well as the previous text embedding 002 model). It compares embedding time for the full dataset using each model.
📈 Testing different embedding models on sample questions
Paragraph 3 shows sample tests using the different embedding models to retrieve relevant documents for some sample input questions. It iterates through the models, from 002 to the new small and then large models, assessing differences in performance.
😃 The new large embedding model performs best
Paragraph 4 concludes after testing that the new large embedding model from OpenAI (text embedding 3 large) performs the best out of those tested, getting 4/5 relevant results for the sample question. The author is interested to test compressing down to 256 dimensions as claimed.
Mindmap
Keywords
💡ChatGPT
💡text embeddings
💡dimensions
💡multilingual embeddings
💡maximum context window
💡knowledge cutoff date
💡vector indexes
💡embedding latency
💡information retrieval
💡natural language processing
Highlights
OpenAI released two new embedding models, text embedding 3 small and text embedding 3 large, with improved performance
The new models have much better multilingual performance measured by the mircle benchmark
The models have not been trained on more recent data, with a September 2021 knowledge cutoff
The large model can be compressed to 256 dimensions while still outperforming the older 002 model
The new models have slower embedding speeds compared to 002
Tested relevance search on 3 models, the large model performed best in retrieving relevant documents
The small model runs at comparable speed to 002 for embedding
The large model is slower for embedding compared to small and 002
Can customize embedding dimensions, but lower dimensions means lower quality
API latency seems slow currently, likely due to high demand after release
Hardest question on LL2 could not be answered well by any model
Small and large models retrieved some relevant documents comparing LL2 and GPT4
Large model got most relevant documents on LL2 vs GPT4 comparison
Performance gains align with benchmarks but harder to see in practice
Will test very small dimensionality embeddings from large model against 002
Transcripts
way back in December
2022 we had the biggest shift in how we
approach AI ever that was thanks to open
aai releasing chat GPT at the very end
of November chat GPT quickly caught a
lot of people's attention and it was in
the month of December that the interest
in chat gbt and AI really exploded but
right in the middle of December open AI
really another model that also changed
the entire landscape of AI but it didn't
go as notic as chat GPT and that model
was text embedding order 002 very
creative naming but behind that name is
a model that just completely changed the
way that we do information retrieval for
natural language which covers rag FACS
and also basically any use case where
you're retrieving text information now
since then despite a huge explosion in
the number of people using Rag and the
really cool things that you can do with
rag open the eye remain pretty quiet in
their embedding models right embedding
models are what you need for Rag and
there has been no new models since
December 20122 until now open AI has
just released two new embedding models
and a ton of other things as well those
two embedding models are called text
embedding 3 small and text embedding
three large and when we look at the
results that open is sharing right now
we can see a fairly decent Improvement
on English language embeddings with the
mte Benchmark but perhaps more
impressively we see a massive
Improvement in the quality of mul IL
lingual embeddings which are measure
using the miracle Benchmark now
002 state-ofthe-art when it was released
and for a very long time afterwards and
still still a top performing embedding
model that had an average score of 31.4
on mirle the new Tex embedding 3 large
has an average score of
54.9 on Miracle that's a massive
difference now one of the other things
you notice looking at the these new
models is that they have not increased
the max context window so the maximum
number of tokens that you can feed into
the model that makes a lot of sense with
embedding models because what you're
trying to do with embeddings is trying
to compress the meaning of some text
into a single point and if you have a
larger chunk of text there's usually
many meanings within that text so going
large and trying to compress into a
single point doesn't you know those two
things don't really go together because
that large text can have many meanings
so it always makes sense to use smaller
chunks and clearly opening eye of are
aware of that they're not increasing the
maximum number of tokens that you can
embed with these models now the other
thing which is maybe not as clear to me
is that they have not trained on more
recent data the knowledge date cut off
is still September 20121 which is a fair
while ago now and okay for embedding
models maybe that isn't quite as
important as it is for llms but it's
still important it's good to have some
context of recent events when you're
trying to embed meaning so things like
covid you ask a covid question these
models I imagine are probably not going
to perform as well as say coher
embedding models which have been trained
on more recent uh data nonetheless this
is still very impressive and one thing
which I think is probably the most
impressive thing that I've seen so far
is is we're now able to decide how many
dimensions we'd like in our vectors now
there is a tradeoff you reduce the
number of Dimensions you're going to get
reduced quality embeddings but what is
incredibly interesting and I almost
don't quite believe it yet I need I
still need to test this is that they're
saying that the large model Tex
embedding three large you can cut it
down from 372 diam Dimensions which is
larger than the previous models you can
cut that down to 256 dimensions and so
outperform order 002 which is
a536 dimension embedding model
compressing all of that performance into
256 floating Point numbers is insane
so I'm going to I'm going to test that
not right now but I'm going to test that
and just prove to myself that that is
possible I'm a little bit skeptical but
if so incredible okay so with that out
the way let's jump into how we might use
this new model okay so jumping right
into it we have this notebook I'm going
to share with you a link either in the
description I will try and get a link
added to the video as well and first I'm
going to do download data set well pip
install first then I'm going to download
data set okay so I'm using this AI
archive I've used it a million times
before uh but it is a good data set for
testing going to remove all of the
columns I don't care about I'm going to
keep just ID text metadata okay typical
format then I'm going to initialize or
I'm going to take my open a API key okay
so that's platform. open.com if you need
one and I'm going to put in here and
then this is how you create your new
embeddings okay exactly the same as what
you did before you just change the model
ID now okay and we'll see those in the
moment as well so that is our embedding
function then we jump down we're going
to initialize connection to Pyon
serverless so you get $100 free credit
and you can create multiple indices
which is what we need because I want to
test multiple models here with different
dimensionalities so that's why I'm using
serverless alongside all the other
benefits that you get from it as well
now taking a look at this the these are
the models we're going to take a look at
using the default dimensions for now we
will try the others pretty soon so we
have the original model well kind of
original the you know V2 of embedding
from open AI so this is the one they
released in December 2022 the
dimensionality there 15 36 most of us
will be very familiar with that number
by now now the small model uses the same
dimensionality and you can also decrease
this
down to 512 okay nice nice little cool
thing you can do there the other
embedding model so the large one the one
with the like insane performance gains
is this one so three large higher
dimensity that means they can you pack
more meaning into that single Vector so
makes sense that this is more
performance uh but what is very cool is
that you can compress this down to 25
Six Dimensions and apparently still
outperform this model here and I mean
that is 100% unheard of within like
vector embeddings like two five six
dimensions and getting this level of
performance is
insane let's see I you know I don't know
maybe I mean they say it's true so uh
then I'm going to kind of go through I'm
going to throw I'm going to create three
different
indexes one for each one of the models
okay and then what I'm going to do is
just index everything now it takes a
little bit of time to index everything
but we can see know while I'm waiting
for that we can have a quick look at how
long this is taking because this is also
something to consider when you're you
know choosing embedding models and you
know looking at these so straight
away one the apis right now are I think
pretty slow because everything has just
been released so I expect during normal
times this number will probably be
smaller so for 002 I'm getting 15 and a
half minutes to embed everything okay
it's to embed and then throw everything
into Pine going slightly slower for the
small model which okay probably maybe
hasn't been as optimized as 002 and also
maybe more people using this right now
but generally it's I mean pretty
comparable speed there as we might
expect embedding through large is
definitely slower okay so right now
we're on on track for about 24 minutes
for that whole thing to embed so yeah
definitely slower that also means your
embedding latency is going to be slower
so I mean you kind of look at this okay
this is 2 seconds uh this is including
like your network latency and everything
thing and also you know going to Pine
Cone as well so you have multiple things
there it's not a 100% fair comparison
but then this one is almost two seconds
slower maybe make like a 1.5 second
slower for a single iteration okay so
this one is definitely slower it will
clearly slow down if you're using rag or
something like that is going to slow
down that process a little bit probably
not that much compared to you know the
LM gener ntion component but still
something to consider so I'm going to
wait for this finish and Skip ahead to
when it has okay so we are done and we
now have okay it's like 20 just about 24
minutes for that final model so I've
created this function it's just going to
go through and basically return
documents for it so let's try it with
002 and see what we get
so keep talking about red teaming for
llama 2 what do we get we got okay red
teaming chat GPT not no not quite
there let's
try with the new small model okay cool
let's see do we mention l two in
here no no l 2 so also not quite there
this was a pretty hard one not I haven't
seen a mod get this one yet so let's see
we're starting with a hard
question okay let's see let's see what
we have here okay so it's talking about
R te exercises this and
this
but I don't see llama 2 no nothing in
there so okay maybe that question is too
hard for any model apparently so let's
try
all right let's just go
with you can tell me why I might want to
use LL 2 why would I want to use llama
2 now the models usually can get
relevant results here so yeah straight
away this one you can see L 2 scales up
to this it's helpfulness and safety is
pretty good per better than existing
over Source models okay cool good that
is uh you know I would hope they can get
this one as OD Z2
can okay same result I think it's
probably the most relevant or one of the
most relevant so let's
see let me see uh so what I want to use
and then here we get so this is a large
modeling us is it the same oh no same
result okay cool that's fine let's try
another question okay so let's try where
we're comparing llama to gbt 4 and just
see how many of these maners should get
either gbt 4 in there or llama so okay
this is are okay you know that's
like four of Five results seem relevant
are they actually are they talking about
see they're talking about GPT 4 as well
and yeah you can see GPT 4 in here don't
actually see GT4 in here see
gptj oh okay no no no so effect no of
instruction tuning using
GT4 but not necessarily comparing to
GT4 okay this one I don't see them
talking about llama or so okay these two
here not relevant this one compar chat
box instruction tuning of llama llama
GT4 out forms this one this one but
there still a gap okay so there's a
comparison there fine here okay so
that's a llama fine tuned on jt4
instructions or outputs but there is a
comparison
and again okay there's a comparison
right so there's like three results
there that are
compared accurate for the small model
Let's see we compare these okay relevant
I would say this one
interesting second one not relevant
third
one all chat BS against GPT 4
comparisons run by a reward mode
indicates that all chat boots are
compared against okay yeah yeah that's
relevant two out of three here I don't
see anything where it's comparing to GPT
4 so I think that's a that's a no so
it's two out of four now okay and then
here there's you know talking kind of
like about the comparisons so three out
of five
but then the other model was slightly oh
it was the same
okay now let's go with the best model we
expect to see more L and I think I do so
this one has l in four of those answers
We
compare okay we're comparing this one no
so look this one okay they're comparing
so that's accurate this
one okay here comparing again and then
this final one here we have okay uh do
we have gpg
4 here I think so they have like B chart
GPT GPT 4 and then they have some I mean
this is a table it's you know it's kind
of hard to understand but it seems like
okay that is actually a comparison as
well so that one okay this one it got
four out of five that's the best
performing one okay that's good that
kind of that that correlates with what
we would expect cool okay those are new
Elling models from open AI I think it's
kind of hard to see the performance
difference there I mean you can see a
little bit maybe with the large model
but given the performance differences we
saw at the start in that table at least
on multilingual there's a massive leap
up which is insane I'm looking forward
to trying the the very small
dimensionality and just comparing that
to 002 I think that is very impressive
definitely try that soon but for now
looks pretty cool definitely want to try
the other models as well that opening I
have released there are a few so for now
I'm going to leave it there I hope all
this has been interesting and useful so
thank you very much for watching and
I'll will see you again in the next one
bye
Browse More Related Video
ChatGPT Explained Completely.
What is LangChain? 101 Beginner's Guide Explained with Animations
Retrieval Augmented Generation - Neural NebulAI Episode 9
Mastering Summarization Techniques: A Practical Exploration with LLM - Martin Neznal
The Hidden Life of Embeddings: Linus Lee
AI News: The AI Arms Race is Getting Insane!
5.0 / 5 (0 votes)