Is it really the best 7B model? (A First Look)

AssemblyAI
29 Sept 202308:24

Summary

TLDRMystal AI, a new company started by ex-Meta and DeepMind employees, has released Mystal 7B - an open source language model with 7.3 billion parameters. It outperforms the much larger LAMA model on benchmarks and has faster inference thanks to innovations like grouped Query Attention. The post demonstrates using Mystal 7B locally with AMA and Hugging Face. It highlights the chat version fine-tuned for conversational tasks. Early testing shows promising performance on math problems but some inconsistencies. More testing is needed to fully evaluate capabilities.

Takeaways

  • 😲 mystal AI released their first model called mystal 7B with 7.3 billion parameters
  • 👨‍💻 mystal 7B was developed by ex-Meta and DeepMind employees and released under Apache 2.0 license for all to use
  • 📈 mystal claims 7B outperforms Lama 2 (13 billion parameters) on multiple benchmarks while using much less parameters
  • 🔎 mystal provides detailed benchmark comparisons between 7B and other models like Lama 2 and 134B
  • 🚀 mystal attributes performance gains to innovations like grouped query attention and longer sequence handling
  • 🤖 mystal released both a base model and fine-tuned instruct model optimized for chat
  • 👩‍💻 You can access mystal 7B easily via AMA ol or HuggingFace transformers
  • ✅ Quick start code, docs and examples for using 7B are provided in the HuggingFace documentation
  • 💬 For chat, wrap your prompts with ins tags so model knows when you are speaking
  • 🤔 Try out math, reasoning and joke questions to test mystal 7B's capabilities

Q & A

  • What company released the mystal 7B model?

    -A new AI startup called mystal released the mystal 7B model.

  • How big is the mystal 7B model in terms of parameters?

    -The mystal 7B model has 7.3 billion parameters.

  • What licensing is the mystal 7B model released under?

    -The mystal 7B model is released under the Apache 2.0 open source license, allowing anyone to use it commercially or for education.

  • What tasks is the mystal 7B model optimized for?

    -The mystal 7B model is optimized for low latency text categorization, summarization, text and code completion.

  • How does the mystal 7B model compare in performance to Lambda's models?

    -Mystal claims their 7.3 billion parameter model outperforms Lambda's larger 13 billion and 23 billion parameter models on benchmarks.

  • What technologies does mystal attribute their model's efficiency to?

    -Mystal attributes their model's faster inference times and ability to handle longer sequences to grouped query attention and sliding window attention.

  • What are two ways mentioned to start using the mystal 7B model?

    -The two ways mentioned are: 1) Using AMA.ol to run the model locally 2) Using the Hugging Face Transformers library

  • What tags need to be used when prompting the instruct version of mystal 7B?

    -The instruct version expects ins tags before and after the user input to distinguish it from the model's response.

  • Does the model correctly solve mathematical word problems?

    -It is able to solve some simple mathematical word problems but may struggle with more complex ones.

  • How does the model respond to the nonsensical woodchuck question?

    -It recognizes the woodchuck question is absurd but meant humorously to express that predicting something impossible is useless.

Outlines

00:00

😊 Overview of mystal AI's 7B parameter model release

This paragraph provides an overview of the release of mystal AI's first model, called mystal 7B. It has 7 billion parameters and is optimized for text tasks. The model is open source and outperforms larger models like LLama on benchmarks while using less parameters. The post explains mistal's technical innovations enabling this.

05:01

😃 Using mystal 7B with AMA and Hugging Face Transformers

This paragraph shows how to use mystal 7B, either locally with AMA or with Hugging Face Transformers. It provides code examples and tips like using the instruct version and ins tags for conversational usage. A math example is provided where mystal 7B solves a word problem correctly.

Mindmap

Keywords

💡mystal 7B

This refers to the new language model released by startup mystal AI. It is a 7.3 billion parameter model optimized for low latency text tasks like categorization, summarization, completion etc. The model is highlighted for outperforming larger models like LaMDA while using less parameters, attributed to technical innovations like grouped Cury attention.

💡language model

A language model is a key component of large language AI systems like chatbots. It is trained on vast amounts of text data to predict probable next words and sentences. mystal 7B is presented as an advanced new language model.

💡Apache 2.0 license

mystal 7B uses an open Apache 2.0 license allowing free commercial and educational use. This makes it stand out from other proprietary models, opening up applications.

💡fine-tuning

In addition to the base mystal 7B model, the company released a fine-tuned version for conversational tasks like chat. Fine-tuning adapts a model on specific types of data/tasks to improve performance.

💡low latency

The mystal model is optimized specifically for low latency performance in real-time text applications. This makes it suitable for integrations in chatbots, search engines etc needing fast responses.

💡inference time

This refers to the time taken by a trained AI model to process an input and provide an output/prediction. Low inference time is important for real world applications so mystal claims optimizations for this.

💡model parameters

Parameters refer to the internal adjustable weights within a machine learning model. More parameters usually lead to higher performance. mystal 7B uses relatively less parameters than other models while aiming for better efficiency.

💡benchmarks

To demonstrate the performance of mystal 7B, the creators benchmarked it against other models like LaMDA using standardized datasets and metrics. The benchmarks showed it surpassing the larger models.

💡instruct tags

The docs for using mystal 7B chat model recommend using HTML-like instruct tags to delineate user input vs model responses. Eg <ins>User prompt</ins>. This provides contextual cues.

💡fine-tuning

Fine-tuning a model adapts it's already trained knowledge for better performance on specialized tasks/data. mystal released a fine-tuned instruct version of 7B optimized for conversational chat abilities.

Highlights

Mystal AI is a new company started by ex-Meta and DeepMind employees

Mystal 7B is open source with an Apache 2.0 license allowing commercial use

Mystal 7B outperforms larger models like LaMDA 2 on benchmarks

Mystal 7B uses grouped QKV attention for faster inference

Mystal 7B can handle longer text sequences more efficiently

Anyone can use Mystal 7B locally with AMA.ol or Hugging Face

The instruct version of Mystal 7B is fine-tuned for chat

Use "ins" tags in prompts to differentiate user and agent text

Mystal 7B solved a mathematical word problem correctly

Some math problems are not answered correctly by Mystal 7B

Mystal 7B understood and explained the nonsense phrase with woodchucks

Mystal 7B is impressive for its performance relative to its smaller size

Try out Mystal 7B yourself to evaluate its capabilities

Share your thoughts on the capabilities of this new model

More Mystal 7B tutorials coming soon

Transcripts

play00:00

mystal AI just released their first

play00:02

model and it's called mystal 7B they

play00:05

initially got a lot of attention based

play00:07

on how much money they raised for their

play00:08

seed Dr so you might have seen this

play00:10

piece of news online but you have to

play00:13

remember that even though it's a very

play00:15

young company it was made by ex meta and

play00:18

Deep Mind employees so these are not

play00:20

people who are starting completely from

play00:22

scratch the model as we see is released

play00:25

with the Apache 2.0 uh licensing so that

play00:29

means that everyone can use it for any

play00:31

of their use cases including educational

play00:34

or commercial use cases and this makes

play00:36

this model to the biggest of its size to

play00:39

be made open source completely looking

play00:41

at the release blog post we see that

play00:44

they released a base model but they also

play00:46

released a fine-tuned model for instruct

play00:49

or chat purposes and the model is

play00:51

optimized for low latency text

play00:54

categorization and summarization also

play00:57

text and code completion so in the

play00:59

release blog post we see that Mel claims

play01:02

that they are outperforming Lama 2 the

play01:04

bigger model the 13 billion model on all

play01:06

benchmarks and they are sharing some

play01:08

details of this comparison with us uh on

play01:11

the MML U data set for example they also

play01:14

compare it on knowledge reasoning and

play01:16

comprehension in English data sets based

play01:18

on these results mystal 7B seems to be

play01:21

performing much better than the Lama 27

play01:24

billion model uh and also the 13 billion

play01:27

model which are which is much bigger

play01:29

than the 7 billion model uh and also the

play01:32

Llama

play01:34

134 billion model and they're saying

play01:36

that they use the Lama 134 billion model

play01:38

because there hasn't been a Lama 234

play01:40

billion model yet it's kind of

play01:43

interesting to see their analysis here

play01:45

they're showing us all the um different

play01:47

kinds of benchmarks that they use to

play01:49

compare these models and also how mystal

play01:52

7 billion compares to the Lama 2 model

play01:55

so effectively to reach the same

play01:58

performance on the ml MML U uh data set

play02:01

on The Benchmark uh Lama 2 has to have

play02:05

23 billion parameters to reach the same

play02:08

level of performance that mistal has

play02:10

with only 7.3 billion parameters so

play02:13

mistal is attributing um their faster

play02:16

inference time to grouped cury attention

play02:18

and their capacity to handle longer

play02:21

sequences at smaller cost to sliding

play02:23

window attention and if you like in the

play02:25

release blog post they go a little bit

play02:26

more into details of how they use these

play02:29

techn techologies overall this is a very

play02:31

interesting Improvement both because we

play02:33

see a smaller model that only has 7

play02:36

billion parameters outperform bigger

play02:38

models but also because we have a

play02:41

highquality model made accessible for

play02:43

everyone to use let's take a look at how

play02:45

we can use mystal 7B right now first

play02:48

option is to use AMA ol is just a way

play02:52

for you to run these large language

play02:54

models locally in a very very simple way

play02:57

all you have to do is to download AMA to

play02:59

your your laptop and then after you're

play03:01

done with it you can go to the list of

play03:02

models here are all the models that

play03:05

they're hosting or not hosting all the

play03:08

models that they have in their

play03:10

repository and you truly only need to

play03:12

run AMA run mistol so let's try

play03:19

that but of course it's going to take a

play03:21

second for it to download the model I've

play03:23

downloaded it before so it was quite

play03:25

instantly for me and from then on you

play03:27

can ask questions to mistel so let's

play03:30

start

play03:31

with hello my name is

play03:34

musra what is your

play03:40

name can you give me

play03:44

some

play03:46

recommendations for places in London to

play03:53

eat and then it gives me a list of

play03:56

places to eat what if I ask

play04:00

um are any of these

play04:05

vegetarian

play04:12

restaurants all right so they have

play04:14

vegetarian options available um so just

play04:17

to check that like it still remembers

play04:18

who I am I'll say do you

play04:22

remember who I

play04:28

am thank you they should always be nice

play04:32

to our large language

play04:33

Models All right so this is how you can

play04:35

use it with ol Lama and I'll also show

play04:37

you quickly how to use it uh with

play04:40

hugging face because there's already a

play04:42

released version on hugging face you can

play04:45

either use the Bas version or if you

play04:47

search for it you'll see that there is a

play04:49

instruct version that also means the

play04:51

chat version so basically the one that

play04:53

was fine-tuned U for a chat use case on

play04:57

hugging face we already have a good

play04:59

documentation for this so if you want to

play05:01

use it in Transformers you can say read

play05:03

more on the documentation which I

play05:05

already have open here or you can also

play05:07

find it in the hugging face

play05:08

documentation under text models mral

play05:12

will be there um so what I did is to

play05:15

basically use all of this code from the

play05:16

documentation in a Google collab

play05:18

notebook you can find the link to the

play05:20

Google collab notebook if you don't want

play05:22

to deal with it yourself I will leave it

play05:23

in the description so you can find it

play05:25

there uh I'll just give you some tips on

play05:27

like how I uh made it run so first of

play05:30

all you cannot run it on a Google collab

play05:32

the free version because you definitely

play05:35

need the GPU um I set the a 100 GPU to

play05:40

be able to run it for the other ones I

play05:42

wasn't able to uh basically I ran out of

play05:45

RAM and it automatically in the code

play05:49

here it gives you the base model if you

play05:51

want to use a chat model you just need

play05:52

to change it to mystal 7B instruct uh V

play05:56

version

play05:58

0.1 and then yeah and then you can pass

play06:00

your prompts so for the prompts if you

play06:02

if you're using the instruct model you

play06:04

want to use the inst tags in the

play06:06

beginning and the ends uh so it's the

play06:09

beginning ins tag and the back slash ins

play06:11

tag uh to make it obvious that you are

play06:14

the user asking for something so this is

play06:17

your part you're speaking and then it's

play06:19

the agent's time to speak and that's all

play06:22

you have to do because if you don't do

play06:23

that it starts completing your uh

play06:26

sentences or it starts kind of thinking

play06:28

that sentences belong to uh the model uh

play06:31

so if you want to have a chat experience

play06:33

you just need to set the ins tags so I

play06:36

tried it with this one for example if I

play06:38

have 20 apples and I give half of my

play06:40

apples to Sam then Susie takes 20% of my

play06:43

remaining apples how many apples do I

play06:46

have left and after running the

play06:49

model it told me that if you start with

play06:52

20 apples giving half uh to Sam leaves

play06:55

you with 10 apples remaining then if

play06:57

Susie takes 20% off those remaining

play07:00

apples that means she takes two apples

play07:02

so you have eight apples left so for me

play07:04

this quick mathematical problem worked

play07:06

but I've seen other creators online who

play07:08

are trying to use mystal 7B say that the

play07:12

the little math problems that they asked

play07:14

uh were not correctly answered so you

play07:17

should try it out yourself to see how uh

play07:19

it responds so I want to try the how

play07:22

much would would the would chuck cheuck

play07:23

if I would chuck could chuck would uh

play07:26

question let's see what it says

play07:33

all right it says the phrase how much

play07:35

would would a woodchuck chuck if a

play07:36

woodchuck could chuck would is an absurd

play07:38

question and does not provide any useful

play07:40

information it is a humorous way to

play07:42

express the idea that it is impossible

play07:44

to accurately predict or measure

play07:46

something that cannot occur all right

play07:48

sounds a little bit aggressive but it

play07:51

understood the question and it

play07:53

understood the context behind it and it

play07:55

did not take it literally so that's good

play07:57

um and this is how you can really easily

play08:00

start using MST 7B what do you think

play08:03

about this model leave your thoughts in

play08:04

the comments below and in the coming

play08:06

days the more we learn about this and

play08:08

the more in-depth tutorials we will be

play08:10

sharing with you thanks for being here

play08:12

and I will see you in the next

play08:15

[Music]

play08:23

video