Is it really the best 7B model? (A First Look)
Summary
TLDRMystal AI, a new company started by ex-Meta and DeepMind employees, has released Mystal 7B - an open source language model with 7.3 billion parameters. It outperforms the much larger LAMA model on benchmarks and has faster inference thanks to innovations like grouped Query Attention. The post demonstrates using Mystal 7B locally with AMA and Hugging Face. It highlights the chat version fine-tuned for conversational tasks. Early testing shows promising performance on math problems but some inconsistencies. More testing is needed to fully evaluate capabilities.
Takeaways
- 😲 mystal AI released their first model called mystal 7B with 7.3 billion parameters
- 👨💻 mystal 7B was developed by ex-Meta and DeepMind employees and released under Apache 2.0 license for all to use
- 📈 mystal claims 7B outperforms Lama 2 (13 billion parameters) on multiple benchmarks while using much less parameters
- 🔎 mystal provides detailed benchmark comparisons between 7B and other models like Lama 2 and 134B
- 🚀 mystal attributes performance gains to innovations like grouped query attention and longer sequence handling
- 🤖 mystal released both a base model and fine-tuned instruct model optimized for chat
- 👩💻 You can access mystal 7B easily via AMA ol or HuggingFace transformers
- ✅ Quick start code, docs and examples for using 7B are provided in the HuggingFace documentation
- 💬 For chat, wrap your prompts with ins tags so model knows when you are speaking
- 🤔 Try out math, reasoning and joke questions to test mystal 7B's capabilities
Q & A
What company released the mystal 7B model?
-A new AI startup called mystal released the mystal 7B model.
How big is the mystal 7B model in terms of parameters?
-The mystal 7B model has 7.3 billion parameters.
What licensing is the mystal 7B model released under?
-The mystal 7B model is released under the Apache 2.0 open source license, allowing anyone to use it commercially or for education.
What tasks is the mystal 7B model optimized for?
-The mystal 7B model is optimized for low latency text categorization, summarization, text and code completion.
How does the mystal 7B model compare in performance to Lambda's models?
-Mystal claims their 7.3 billion parameter model outperforms Lambda's larger 13 billion and 23 billion parameter models on benchmarks.
What technologies does mystal attribute their model's efficiency to?
-Mystal attributes their model's faster inference times and ability to handle longer sequences to grouped query attention and sliding window attention.
What are two ways mentioned to start using the mystal 7B model?
-The two ways mentioned are: 1) Using AMA.ol to run the model locally 2) Using the Hugging Face Transformers library
What tags need to be used when prompting the instruct version of mystal 7B?
-The instruct version expects ins tags before and after the user input to distinguish it from the model's response.
Does the model correctly solve mathematical word problems?
-It is able to solve some simple mathematical word problems but may struggle with more complex ones.
How does the model respond to the nonsensical woodchuck question?
-It recognizes the woodchuck question is absurd but meant humorously to express that predicting something impossible is useless.
Outlines
😊 Overview of mystal AI's 7B parameter model release
This paragraph provides an overview of the release of mystal AI's first model, called mystal 7B. It has 7 billion parameters and is optimized for text tasks. The model is open source and outperforms larger models like LLama on benchmarks while using less parameters. The post explains mistal's technical innovations enabling this.
😃 Using mystal 7B with AMA and Hugging Face Transformers
This paragraph shows how to use mystal 7B, either locally with AMA or with Hugging Face Transformers. It provides code examples and tips like using the instruct version and ins tags for conversational usage. A math example is provided where mystal 7B solves a word problem correctly.
Mindmap
Keywords
💡mystal 7B
💡language model
💡Apache 2.0 license
💡fine-tuning
💡low latency
💡inference time
💡model parameters
💡benchmarks
💡instruct tags
💡fine-tuning
Highlights
Mystal AI is a new company started by ex-Meta and DeepMind employees
Mystal 7B is open source with an Apache 2.0 license allowing commercial use
Mystal 7B outperforms larger models like LaMDA 2 on benchmarks
Mystal 7B uses grouped QKV attention for faster inference
Mystal 7B can handle longer text sequences more efficiently
Anyone can use Mystal 7B locally with AMA.ol or Hugging Face
The instruct version of Mystal 7B is fine-tuned for chat
Use "ins" tags in prompts to differentiate user and agent text
Mystal 7B solved a mathematical word problem correctly
Some math problems are not answered correctly by Mystal 7B
Mystal 7B understood and explained the nonsense phrase with woodchucks
Mystal 7B is impressive for its performance relative to its smaller size
Try out Mystal 7B yourself to evaluate its capabilities
Share your thoughts on the capabilities of this new model
More Mystal 7B tutorials coming soon
Transcripts
mystal AI just released their first
model and it's called mystal 7B they
initially got a lot of attention based
on how much money they raised for their
seed Dr so you might have seen this
piece of news online but you have to
remember that even though it's a very
young company it was made by ex meta and
Deep Mind employees so these are not
people who are starting completely from
scratch the model as we see is released
with the Apache 2.0 uh licensing so that
means that everyone can use it for any
of their use cases including educational
or commercial use cases and this makes
this model to the biggest of its size to
be made open source completely looking
at the release blog post we see that
they released a base model but they also
released a fine-tuned model for instruct
or chat purposes and the model is
optimized for low latency text
categorization and summarization also
text and code completion so in the
release blog post we see that Mel claims
that they are outperforming Lama 2 the
bigger model the 13 billion model on all
benchmarks and they are sharing some
details of this comparison with us uh on
the MML U data set for example they also
compare it on knowledge reasoning and
comprehension in English data sets based
on these results mystal 7B seems to be
performing much better than the Lama 27
billion model uh and also the 13 billion
model which are which is much bigger
than the 7 billion model uh and also the
Llama
134 billion model and they're saying
that they use the Lama 134 billion model
because there hasn't been a Lama 234
billion model yet it's kind of
interesting to see their analysis here
they're showing us all the um different
kinds of benchmarks that they use to
compare these models and also how mystal
7 billion compares to the Lama 2 model
so effectively to reach the same
performance on the ml MML U uh data set
on The Benchmark uh Lama 2 has to have
23 billion parameters to reach the same
level of performance that mistal has
with only 7.3 billion parameters so
mistal is attributing um their faster
inference time to grouped cury attention
and their capacity to handle longer
sequences at smaller cost to sliding
window attention and if you like in the
release blog post they go a little bit
more into details of how they use these
techn techologies overall this is a very
interesting Improvement both because we
see a smaller model that only has 7
billion parameters outperform bigger
models but also because we have a
highquality model made accessible for
everyone to use let's take a look at how
we can use mystal 7B right now first
option is to use AMA ol is just a way
for you to run these large language
models locally in a very very simple way
all you have to do is to download AMA to
your your laptop and then after you're
done with it you can go to the list of
models here are all the models that
they're hosting or not hosting all the
models that they have in their
repository and you truly only need to
run AMA run mistol so let's try
that but of course it's going to take a
second for it to download the model I've
downloaded it before so it was quite
instantly for me and from then on you
can ask questions to mistel so let's
start
with hello my name is
musra what is your
name can you give me
some
recommendations for places in London to
eat and then it gives me a list of
places to eat what if I ask
um are any of these
vegetarian
restaurants all right so they have
vegetarian options available um so just
to check that like it still remembers
who I am I'll say do you
remember who I
am thank you they should always be nice
to our large language
Models All right so this is how you can
use it with ol Lama and I'll also show
you quickly how to use it uh with
hugging face because there's already a
released version on hugging face you can
either use the Bas version or if you
search for it you'll see that there is a
instruct version that also means the
chat version so basically the one that
was fine-tuned U for a chat use case on
hugging face we already have a good
documentation for this so if you want to
use it in Transformers you can say read
more on the documentation which I
already have open here or you can also
find it in the hugging face
documentation under text models mral
will be there um so what I did is to
basically use all of this code from the
documentation in a Google collab
notebook you can find the link to the
Google collab notebook if you don't want
to deal with it yourself I will leave it
in the description so you can find it
there uh I'll just give you some tips on
like how I uh made it run so first of
all you cannot run it on a Google collab
the free version because you definitely
need the GPU um I set the a 100 GPU to
be able to run it for the other ones I
wasn't able to uh basically I ran out of
RAM and it automatically in the code
here it gives you the base model if you
want to use a chat model you just need
to change it to mystal 7B instruct uh V
version
0.1 and then yeah and then you can pass
your prompts so for the prompts if you
if you're using the instruct model you
want to use the inst tags in the
beginning and the ends uh so it's the
beginning ins tag and the back slash ins
tag uh to make it obvious that you are
the user asking for something so this is
your part you're speaking and then it's
the agent's time to speak and that's all
you have to do because if you don't do
that it starts completing your uh
sentences or it starts kind of thinking
that sentences belong to uh the model uh
so if you want to have a chat experience
you just need to set the ins tags so I
tried it with this one for example if I
have 20 apples and I give half of my
apples to Sam then Susie takes 20% of my
remaining apples how many apples do I
have left and after running the
model it told me that if you start with
20 apples giving half uh to Sam leaves
you with 10 apples remaining then if
Susie takes 20% off those remaining
apples that means she takes two apples
so you have eight apples left so for me
this quick mathematical problem worked
but I've seen other creators online who
are trying to use mystal 7B say that the
the little math problems that they asked
uh were not correctly answered so you
should try it out yourself to see how uh
it responds so I want to try the how
much would would the would chuck cheuck
if I would chuck could chuck would uh
question let's see what it says
all right it says the phrase how much
would would a woodchuck chuck if a
woodchuck could chuck would is an absurd
question and does not provide any useful
information it is a humorous way to
express the idea that it is impossible
to accurately predict or measure
something that cannot occur all right
sounds a little bit aggressive but it
understood the question and it
understood the context behind it and it
did not take it literally so that's good
um and this is how you can really easily
start using MST 7B what do you think
about this model leave your thoughts in
the comments below and in the coming
days the more we learn about this and
the more in-depth tutorials we will be
sharing with you thanks for being here
and I will see you in the next
[Music]
video
Browse More Related Video
![](https://i.ytimg.com/vi/fojzFcpDE_M/hqdefault.jpg?sqp=-oaymwEXCJADEOABSFryq4qpAwkIARUAAIhCGAE=&rs=AOn4CLB6nwJcELP1QexKsHm6XBx-aWy5KA)
Mistral 7B: Smarter Than ChatGPT & Meta AI - AI Paper Explained
![](https://i.ytimg.com/vi/OMIuP6lQXe4/hq720.jpg)
Mistral 7B - The New 7B LLaMA Killer?
![](https://i.ytimg.com/vi/QO5YQHFfbRY/hq720.jpg)
Grok-1 FULLY TESTED - Fascinating Results!
![](https://i.ytimg.com/vi/b2MPvE1OyHg/hq720.jpg?v=66215c85)
LLAMA 3 Released - All You Need to Know
![](/_next/static/media/default-video-cover.615af72e.png)
ChatGPT Explained Completely.
![](https://i.ytimg.com/vi/_KPEoCSKHcU/hq720.jpg?v=65b75b0b)
Fine-tuning Tiny LLM on Your Data | Sentiment Analysis with TinyLlama and LoRA on a Single GPU
5.0 / 5 (0 votes)