How to tweak your model in Ollama or LMStudio or anywhere else

Matt Williams

22 Aug 202411:42

Summary

TLDRThis video script delves into the intricacies of Large Language Models (LLMs), focusing on the parameters that influence their output. It explains concepts like temperature, context size (num_ctx), and the importance of setting seed for consistent results. The script also covers advanced parameters for controlling text generation, such as stop words, repeat penalty, and top k/p. Additionally, it touches on the less common but equally important mirostat sampling parameters, which can affect the diversity and coherence of the model's output. The aim is to guide users on how to fine-tune these parameters for better control over LLMs like Ollama.

Takeaways

🔥 Temperature affects the randomness of a model's output; lower temperatures make the most probable options more probable, while higher temperatures increase the chances of less probable options.
📚 The 'num_ctx' parameter sets the context size for the model, influencing how much information it can remember and process during a conversation.
💾 Ollama models start with a default context size of 2K tokens due to memory constraints, which can be adjusted to support larger contexts like 128K tokens.
🚫 'Stop words' and 'phrases' can be used to prevent the model from repeating certain words or symbols, helping to control the output's coherence.
🔄 'Repeat penalty' and 'repeat last n' parameters help manage the repetition of tokens by adjusting their probabilities based on recent usage.
🔑 'Top k' limits the number of tokens considered for the next prediction, while 'top p' focuses on tokens that sum up to a certain probability threshold.
📈 'Min p' sets a minimum logit value for tokens to be considered, ensuring only tokens with a certain probability are included in the prediction.
📊 Tail free sampling (tfs_z) cuts off the tail of probabilities, influencing the diversity of the model's output by adjusting the range of considered tokens.
🌱 The 'seed' parameter ensures consistent output by making the random number generator predictable, which is useful for testing scenarios.
🔍 Mirostat parameters like 'tau' and 'eta' offer an alternative method for generating the list of next possible tokens, focusing on perplexity and surprise.
✂️ 'Num_predict' determines the maximum number of tokens to predict, with -1 allowing continuous generation until completion and -2 filling the context.

Q & A

What are the common parameters used when working with Large Language Models (LLMs)?
-Common parameters include temperature, num_ctx, stop words and phrases, repeat penalty, repeat last n, top k, top p, min p, tail free sampling (tfs_z), seed, and mirostat parameters such as mirostat tau and eta.
How does the temperature parameter affect the model's output?
-Temperature scales the logits before they become probabilities. A lower temperature makes the most probable option even more probable, while a temperature greater than 1 reduces differences between logits, leading to a more creative output.
What is the purpose of the num_ctx parameter?
-Num_ctx sets the context size for the model, determining how many tokens are in its context. A larger context size can remember more information but requires more memory.
Why might Ollama models start with a default context size of 2k tokens?
-Ollama models start with a default context size of 2k tokens because supporting more tokens requires more memory, and many users have GPUs with limited memory, such as 8GB.
How can you increase the context size of an Ollama model?
-You can increase the context size by creating a new modelfile with the desired num_ctx value and then running 'ollama create' with the new modelfile.
What is the role of stop words and phrases in controlling model output?
-Stop words and phrases tell the model to stop outputting text when it encounters a specific word or symbol, preventing repetition in the generated text.
How does the repeat penalty parameter work to prevent repetition?
-Repeat penalty adjusts the probability of a token if it has been used recently. If the logit is negative, it multiplies by the penalty, and if positive, it divides by the penalty, usually reducing the token's likelihood of being used again.
What is the purpose of the top k parameter?
-Top k determines the length of the list of tokens to be generated for potential next tokens, limiting the list to only the most likely k tokens.
Can you explain the top p parameter and how it differs from top k?
-Top p creates a list of tokens that, when their probabilities are added up, sum to top p. Unlike top k, which focuses on the most likely tokens, top p excludes tokens that, when summed, contribute less to the total probability.
What is the seed parameter used for in LLMs?
-The seed parameter is used to make the random number generator predictable, ensuring that the model generates the same output every time when given the same input and seed.
How do mirostat tau and eta parameters influence the model's output?
-Mirostat tau controls the balance between coherence and diversity, with a higher value resulting in more diverse outputs. Mirostat eta acts as a learning rate, with a higher rate causing the model to adapt faster to changes in the generated text.
What is the num_predict parameter and how does it affect text generation?
-Num_predict is the maximum number of tokens to predict when generating text. Setting it to -1 allows the model to generate until completion, while -2 will fill the context, potentially cutting off at that point.

Outlines

plate

This section is available to paid users only. Please upgrade to access this part.

Mindmap

plate

This section is available to paid users only. Please upgrade to access this part.

Keywords

plate

This section is available to paid users only. Please upgrade to access this part.

Highlights

plate

This section is available to paid users only. Please upgrade to access this part.

Transcripts

plate

This section is available to paid users only. Please upgrade to access this part.

Browse More Related Video

Simplifying Generative AI : Explaining Tokens, Parameters, Context Windows and more.

How Large Language Models Work

Large Language Models explained briefly

Large Language Models (LLMs) - Everything You NEED To Know

LLM Explained | What is LLM

Why LLMs get dumb (Context Windows Explained)

Rate This

★

★

★

★

★

5.0 / 5 (0 votes)

Related Tags

LLM ParametersAI TextTemperature SettingContext SizeLogits ConversionModel ConfigurationText CreativityRepeat ControlSampling MethodsSeed StabilityPerplexity Measure