This is currently the best local model for RooCode | Devstral

GosuCoder

26 May 202517:45

Summary

TLDRIn this video, the speaker discusses their experience running the Devstril model locally for junior-level coding tasks, such as stubbing out code, creating files, and building APIs. Devstril, a 23.6 billion parameter model, impresses with its ability to handle various coding challenges and tasks efficiently. The video compares Devstril with other models like GLM, highlighting its strengths in context window management and low token usage. The speaker also delves into their evaluation setup, custom configurations, and testing with different tools, noting Devstril’s impressive performance despite some minor issues. They conclude with a recommendation for developers working with coding agents and local models.

Takeaways

😀 Devstril by Mistral AI is a powerful local model for running coding tasks such as creating files, simple functions, and APIs.
😀 Devstril is capable of running through a variety of coding tasks and performing well in evaluations (evals), unlike many other models.
😀 The model has shown success in solving tasks related to games (like Connect 4 and Sudoku), production codebase fixes, and even traffic simulation.
😀 The Q6 version of Devstril, especially the unsloth variant, performs well in evaluations, with a 128k max context window, and can handle longer inputs effectively.
😀 Devstril's performance is comparable to larger models, and it can achieve a high level of detail and quality when prompted with clear instructions.
😀 Devstril excels in providing a great starting point for projects, such as landing pages, while models like GLM produce larger token outputs but less efficient results.
😀 It’s recommended to use specific settings in LM Studio (such as setting the temperature to 0.15) for better results with Devstril.
😀 The RTX 5090 GPU improves local model performance significantly, enabling better benchmarking and running of large models with a high VRAM capacity.
😀 Devstril offers impressive coding capabilities in R code, with the ability to complete full evals that other models like GLM struggle with due to context window limitations.
😀 The model's ability to run locally and offer strong coding performance at a relatively low cost is a major advantage for developers looking for cost-effective AI solutions.

Q & A

What is Devstril, and how does it relate to the speaker's work?
-Devstril is a model from Mistral AI that the speaker has been testing for local, junior-level work such as code stubbing, simple functions, APIs, and SQL tasks. It is the first model that the speaker could run locally to complete an evaluation set successfully, which makes it a valuable tool for their coding tasks.
Why does the speaker believe Devstril is an effective model for local use?
-The speaker believes Devstril is effective because it can handle various tasks such as creating simple applications, modifying production code, and solving problems in a local environment. It also works well with root code and maintains a good context limit, which many smaller models struggle with.
How does Devstril compare to other models like GLM?
-Devstril outperforms models like GLM in some areas, particularly in local use with root code. It has a 128K context window, which helps avoid issues seen with models like GLM that have a 32K context window. Additionally, Devstril generates fewer tokens for similar tasks, leading to more efficient performance.
What challenges does the speaker face when running models like GLM?
-The speaker faces challenges with GLM, particularly its limited context window (32K tokens), which makes it unsuitable for long-running tasks or R code without overriding the system prompt. Additionally, GLM generates a large number of tokens, which increases computational costs and reduces efficiency.
What were some of the successful tasks the speaker ran with Devstril?
-The speaker successfully ran several tasks with Devstril, including building a Connect Four game, solving Sudoku puzzles, creating a traffic simulator, developing a landing page, and modifying production code by removing comments or fixing errors. Devstril also worked well for pathfinding and whiteboarding apps.
What configuration settings did the speaker use for running Devstril?
-The speaker used specific settings in LM Studio and the Unsloth variant of Devstril. Key settings included a temperature of 0.15, repeat penalty of 1, min P sampling of 0.01, and a top K value of 40. The speaker also turned off the 'enable editing through diffs' setting, as it caused issues.
Why did the speaker upgrade to an RTX 5090, and how does it benefit their testing?
-The speaker upgraded to an RTX 5090 to improve the performance of running local models. The RTX 5090 provides double the performance of their previous GPU (the 7900 XTX) and offers more VRAM, allowing for better handling of large models and facilitating faster and more efficient model testing.
How does the speaker evaluate the performance of models like Devstril?
-The speaker evaluates models using a series of unit tests and deterministic checks that assess how well the model follows instructions and the quality of its outputs. This is done through scripts in Python and comparisons with other models, considering factors like token generation and context window limitations.
What makes Devstril suitable for agentic coding IDE plugins or environments?
-Devstril is well-suited for agentic coding IDEs because it is a highly focused and efficient model for specific coding tasks. It excels when given specific instructions, making it ideal for use in environments where precise code generation is needed, such as IDEs or local agentic setups.
What is the significance of the 128K context window in Devstril?
-The 128K context window in Devstril allows it to maintain coherence and context over longer tasks, reducing the likelihood of errors when handling complex or extended coding tasks. This is a significant advantage over models with smaller context windows, which may struggle to stay on track during long tasks.