Is it really the best 7B model? (A First Look)

AssemblyAI

29 Sept 202308:24

Summary

TLDRMystal AI, a new company started by ex-Meta and DeepMind employees, has released Mystal 7B - an open source language model with 7.3 billion parameters. It outperforms the much larger LAMA model on benchmarks and has faster inference thanks to innovations like grouped Query Attention. The post demonstrates using Mystal 7B locally with AMA and Hugging Face. It highlights the chat version fine-tuned for conversational tasks. Early testing shows promising performance on math problems but some inconsistencies. More testing is needed to fully evaluate capabilities.

Takeaways

😲 mystal AI released their first model called mystal 7B with 7.3 billion parameters
👨‍💻 mystal 7B was developed by ex-Meta and DeepMind employees and released under Apache 2.0 license for all to use
📈 mystal claims 7B outperforms Lama 2 (13 billion parameters) on multiple benchmarks while using much less parameters
🔎 mystal provides detailed benchmark comparisons between 7B and other models like Lama 2 and 134B
🚀 mystal attributes performance gains to innovations like grouped query attention and longer sequence handling
🤖 mystal released both a base model and fine-tuned instruct model optimized for chat
👩‍💻 You can access mystal 7B easily via AMA ol or HuggingFace transformers
✅ Quick start code, docs and examples for using 7B are provided in the HuggingFace documentation
💬 For chat, wrap your prompts with ins tags so model knows when you are speaking
🤔 Try out math, reasoning and joke questions to test mystal 7B's capabilities

Q & A

What company released the mystal 7B model?
-A new AI startup called mystal released the mystal 7B model.
How big is the mystal 7B model in terms of parameters?
-The mystal 7B model has 7.3 billion parameters.
What licensing is the mystal 7B model released under?
-The mystal 7B model is released under the Apache 2.0 open source license, allowing anyone to use it commercially or for education.
What tasks is the mystal 7B model optimized for?
-The mystal 7B model is optimized for low latency text categorization, summarization, text and code completion.
How does the mystal 7B model compare in performance to Lambda's models?
-Mystal claims their 7.3 billion parameter model outperforms Lambda's larger 13 billion and 23 billion parameter models on benchmarks.
What technologies does mystal attribute their model's efficiency to?
-Mystal attributes their model's faster inference times and ability to handle longer sequences to grouped query attention and sliding window attention.
What are two ways mentioned to start using the mystal 7B model?
-The two ways mentioned are: 1) Using AMA.ol to run the model locally 2) Using the Hugging Face Transformers library
What tags need to be used when prompting the instruct version of mystal 7B?
-The instruct version expects ins tags before and after the user input to distinguish it from the model's response.
Does the model correctly solve mathematical word problems?
-It is able to solve some simple mathematical word problems but may struggle with more complex ones.
How does the model respond to the nonsensical woodchuck question?
-It recognizes the woodchuck question is absurd but meant humorously to express that predicting something impossible is useless.