Goldfish Bowl RAG Intro
Summary
TLDRВ этом видео представлен обзор проекта RAG (Retrieval Augmented Generation), который использует модель для обучения с помощью дополнительных контекстов в виде ссылочных текстов. Цель - научить модель эффективно комбинировать информацию из разных источников для ответа на вопросы. Проект включает в себя создание уникальных запросов, основанных на ссылочных текстах, и оценку двух генерируемых ответов по шкале от 1 до 3 по пяти критериям. Основные требования к запросам - использование ссылочных текстов и избегание общих тем. Подробно рассматриваются примеры хороших и плохих запросов, а также методика оценки ответов.
Takeaways
- 🐟 RAG (Retrieval Augmented Generation) - это метод, при котором модель обучается использовать дополнительный контекст из ссылочных текстов для ответа на запросы.
- 🔍 В рамках проекта Goldfish Bowl, участникам предлагается создавать запросы, используя ссылочные тексты, чтобы обучить модель эффективно комбинировать различную информацию.
- 📝 Создание запросов должно быть основано на ссылочных текстах, и они должны быть достаточно конкретными, чтобы избежать общих или неправильных запросов.
- 🚫 Запрещено использовать чат-ботов или другие LLM (Large Language Models) для создания запросов, что может привести к блокировке участника.
- ✅ Хорошие примеры запросов должны быть основаны на ссылочных текстах и включать в себя основное требование и дополнительное ограничение.
- 📑 Для каждого запроса необходимо добавить от двух до десяти ссылочных текстов, каждый из которых должен содержать не менее 150 слов.
- ✅ Ответы модели оцениваются по шкале от 1 до 3 в пяти категориях: основанность на ссылочных текстах, достоверность, полезность, следование инструкциям и стиль написания.
- 🔍 Оценка основанности на ссылочных текстах и достоверности является наиболее важным критерием при выборе предпочитаемого ответа.
- 📝 Для многоступенчатых задач предпочтение отдается ответу, который лучше всего удовлетворяет запросы и использует ссылочные тексты для продолжения диалога.
- 👍 Важно тщательно проверять и оценивать каждый ответ, чтобы обучить модель правильно использовать ссылочные тексты и давать точные ответы.
Q & A
Что означает аббревиатура RAG?
-RAG означает 'retrieval augmented generation', что подразумевает использование модели в определенном контексте для обучения ее предоставлять ответы на основе дополнительных контекстов, предоставленных через ссылки на тексты.
Каковы ключевые требования к написанию запроса (prompt) для проекта RAG?
-Запрос должен быть основан на ссылочных текстах, иметь как минимум 10 слов, не содержать вежливых фраз, быть достаточно сложным с основным запросом и дополнительными ограничениями, избегать словосложием и не использовать ограничения по количеству слов или предложений.
Что такое ссылочный текст и какова его роль в проекте RAG?
-Ссылочный текст - это дополнительная информация, предоставленная для модели, чтобы помочь ей ответить на запрос. Модель использует этот текст специфического контекста для генерации ответа.
Какие критерии используются для оценки ответов модели в рамках проекта RAG?
-Ответы оцениваются по пяти критериям: основанность на ссылочном тексте, достоверность, полезность, следование инструкциям и стиль написания.
Как определяется 'основанность на ссылочном тексте' в контексте оценки?
-Ответ считается основанным на ссылочном тексте, если все утверждения напрямую основаны на информации из ссылочных текстов и не содержат информации из других источников.
Что подразумевается под категорией 'достоверность' при оценке ответов?
-Категория 'достоверность' оценивается на основе того, являются ли утверждения в ответе правильными или нет, основаны ли они на достоверных и проверенных фактах.
Какова цель использования системы оценки по шкале от 1 до 3 для каждого из пяти критериев?
-Целью использования шкалы оценки является детальное измерение качества ответа модели по каждому из критериев, что позволяет сделать более обоснованное и точечное сравнение между двумя ответами.
Почему важно использовать Liker score при выборе предпочитаемого ответа?
-Liker score используется для указания предпочтения одного из двух ответов и дает возможность оценить, насколько один ответ лучше другого по шкале от 'много лучше' до 'равноценно'.
Что означает 'следование инструкциям' в оценке и как это влияет на результат?
-Следование инструкциям оценивается на основе того, как хорошо модель понимает и выполняет требования запроса пользователя, включая ограничения и основные запросы.
Какой является последовательность действий для создания и оценки запроса в рамках проекта RAG?
-Последовательность действий включает в себя написание запроса, основанного на ссылочных текстах, предоставление двух ответов модели, оценку каждого ответа по пяти критериям, выбор предпочитаемого ответа с использованием Liker score и обоснование выбора.
Outlines
🐟 Введение в проект RAG (Retrieval Augmented Generation)
В этом видео представлен обзор проекта RAG (Retrieval Augmented Generation), где RAG является аббревиатурой для 'retrieval augmented generation'. Проект направлен на обучение модели использовать дополнительный контекст в виде ссылочных текстов для ответа на вопросы. Авторы проекта планируют использовать модель для создания отчетов и ответов на вопросы, используя только предоставленные данные. В видео также рассматриваются общие ошибки и важные моменты, которые следует учитывать при работе с моделью.
📝 Создание подсказок для обучения модели RAG
Второе видео скрипта фокусируется на создании подсказок, которые должны быть основаны на ссылочных текстах. Подсказки должны быть специфичными, не содержать вежливостей и быть сложными с основным запросом и дополнительными ограничениями. Авторы предостерегают от использования чат-GPT или других LLM для создания подсказок, что может привести к блокировке участника в проекте. Также приводятся примеры хороших и плохих подсказок и объясняется, как их можно улучшить.
📚 Использование ссылочных текстов и оценка ответов модели
Третья часть видео скрипта обсуждает, как использовать ссылочные тексты для ответа на подсказки и как оценивать два генерированных ответа модели. Оценка проводится по шкале от 1 до 3 по пяти критериям: основанность на ссылочных текстах, достоверность, помощь, следование инструкциям и стиль письма. Авторы подробно объясняют, как проверять каждый утверждение в ответе на соответствие ссылочному тексту и достоверность информации.
🔍 Проверка точности и полноты ответов модели
В четвертой части видео скрипта рассматривается, как проверять точность и полноту ответов модели. Авторы подчеркивают важность проверки каждого утверждения на соответствие ссылочному тексту и его достоверность с помощью Google или других источников. Также рассматриваются примеры ответов и их оценка на основе указанных критериев.
📊 Оценка и предпочтения между ответами модели
Пятое видео скрипта фокусируется на оценке и выборе предпочтительных ответов между двумя генерированными ответами модели. Авторы вводят шкалу от 1 до 5 для выражения предпочтений и подчеркивают, что оценка должна основываться на ключевых критериях, таких как основанность на ссылочных текстах и достоверность. Также объясняется, как оправдывать предпочтения с помощью примеров и деталей из ответов.
🗣️ Многоступенчатые диалоги и использование предпочтительных ответов
В заключительной части видео скрипта рассматривается процесс создания многоступенчатых диалогов с использованием предпочтительных ответов. Авторы дают советы по созданию естественных и релевантных подсказок, основанных на ссылочных текстах, и подчеркивают важность корректного использования ссылочных текстов для успешного обучения модели RAG.
Mindmap
Keywords
💡RAG
💡Reference Text
💡Prompt
💡Turn
💡Liker Score
💡Rubric
💡Truthfulness
💡Helpfulness
💡Instruction Following
💡Writing Style and Tone
Highlights
RAG (Retrieval Augmented Generation) is an AI technique that uses reference texts to provide context for generating responses.
The project involves training a model to answer prompts using specific reference texts, ensuring the responses are grounded in the provided information.
Reference texts are crucial as they guide the model to generate responses that are relevant and accurate to the given prompts.
Prompts must be based on the reference text and should not be general or contrived; they should emulate real user queries.
Avoiding sensitive topics and ensuring prompts are specific and complex is key to effective RAG training.
The project requires participants to write prompts that are at least 10 words, with a minimum and maximum number of turns specified.
Prompts should be free from pleasantries and word/sentence count constraints to mimic natural user interactions.
Examples of bad prompts include those that are too broad, contain spelling mistakes, or lack specificity.
Good prompts are specific, grounded in reference texts, and avoid summary requests or word count constraints.
The model generates two responses for each prompt, which are then reviewed and rated based on five criteria.
Criteria for rating responses include reference text grounding, truthfulness, helpfulness, instruction following, and writing style and tone.
Reference text grounding is critical; all claims in the response must be directly grounded in the reference text provided.
Truthfulness ensures that the claims made in the response are accurate and verifiable, avoiding unfounded statements.
Helpfulness evaluates how well the model answers the user's request, with a focus on providing relevant and sufficient information.
Instruction following assesses the model's ability to understand and adhere to the constraints and requirements of the user's prompt.
Writing style and tone are the least important criteria but still contribute to the overall quality of the response.
A five-point Likert scale is used to indicate preference between the two model responses, with justifications required for the selection.
Justifications must align with the Likert score, providing specific examples and details to explain the rating choices.
For multi-turn tasks, the preferred response from the previous turn is used as context for the next turn, maintaining a natural conversation flow.
The importance of utilizing reference texts in prompts cannot be overstated, as it is fundamental to the success of the RAG project.
Transcripts
George and I am a member of the Goldfish
Bowl rag team thank you for joining the
project and taking the time to watch
this introductory video in this video
I'm going to be giving a bit of an
overview as to what we are doing in this
project and why specifically how we are
going to do it and then I will also go
over some common errors we've been
seeing as well as just some important
things to keep in mind with some tips so
first and foremost as the is in the name
this is a rag project rag is an acronym
for retrieval augmented generation so
what that means is we are going to be
using the model in a very specific way
in order to train it to do rag that
means we are giving the model additional
context via reference texts to help it
answer our prompt and so imagine you
have a company and your company has a
large language model it would be great
if you could give that large language
model all of your internal company data
like sales projections supply
chain data uh inventory calculations and
then use that large language model to
answer questions like create a business
report and it would only use the
information in that material you gave it
so that's essentially the context of rag
we are going to be prompting the model
giving it what we call reference text or
additional
information and then the model will use
that information specifically to answer
our question so we are teaching the
model how to do that what information to
use and how to combine information
effectively across different reference
texts so that is a brief overview as to
rag it is a super exciting space right
now which makes this project all the
more exciting as
well so in this project specifically we
are going to be giving the model a
prompt with a set of reference text the
model will generate two responses we
will be reviewing each response uh and
rating it according to five different
criteria then we will decide which of
these models we prefer and we will
indicate that preference with what is
called a Liker to score we are going to
justify our preference and explain why
we are rating and then we will
potentially continue the conversation in
another
turn so to go into more specifics step
one of this project is to write a prompt
that utilizes reference text some key
requirements The Prompt needs to be at
least 10 words we give a list of three
suggested topics for inspiration
although they are not
mandatory the tasks will also be given a
specified minimum and maximum number of
turns a turn is a prompt response pair
so if I ask a model tell me about Sweden
and then it tells me about Sweden and
then I ask give me an itinerary for
Sweden and then it responds again that'
be two turns in that conversation
there's two prompt response
pairs what's really important in this
project is that the prompts utilize the
reference
text I will reiterate the prompts need
to be based on the reference
text so what that means is that prompts
cannot be general for example to say
summarize these reference texts is a
terrible prompt the prompts must be
based on the reference text and use that
information in a very specific way I
will go over that in more detail but
this is Paramount to the project the
prompts need to be based on the
reference text in addition we have other
guidelines for the prompts they should
really emulate how a user would use the
model and not be contrived they should
not have any pleasantries no hey yo
thank you I appreciate it nothing like
that they should be sufficiently complex
with a main request and additional
constraint so this example here
highlights a few important things but
the prompts tell me about George
Washington in exactly 10 sentences is
terrible it is bad for a number of
reasons this is contrived a real user
would not care very much about a
specific sentence count and on that note
we want to avoid any constraints that
have word counts or sentence counts this
is also a summary and it's a very basic
request a good example would be the
following explain the new regulatory
changes for cryptocurrency trading that
were implemented in the EU in the past
three months and how it might affect my
investments in Bitcoin and
ethereum this prompt is incredibly
specific imagine ref texts that explain
the regulatory changes maybe an analysis
of it as well as just the
facts the current state and maybe future
predictions of Bitcoin how these relate
to it this is a great way to teach the
model how to do
rag when we are doing rag we want to
make sure to avoid sensitive topics so
nothing inappropriate or hot button or
contentious finally you cannot use chat
GPT or other llms to create prompts or
if we catch you doing it you will be
banned for the
project now I will go over a few more
examples here are some bad examples
first hey I'm doing a school project on
DH Lawrence who was he first and
foremost it starts with a
pleasantry second of all it is a simple
summerization
request this example here is also
incredibly contrived and unnatural with
all of the spelling mistakes overly
informal tone and lack of specificity of
the
request now we will go over two examples
of making prompts
better here you can see a prompt yo can
you give me a list of athletes who left
has a great impact of
sports we can remove the pleasant tree
become more specific and become more
grounded in the reference
text so a similar but significantly
better example would be who are some of
the most impactful athletes in Olympic
history what are their greatest
achievements and most memorable
moments another similar prompt compare
the impacts accomplishments and Legacies
of top Olympic athletes you s bolt
Michael Phelps and Simone biles you
could imagine reference text here maybe
perhaps one on bolt one on Michael
Phelps one on Simone biles one on a
analysis of top moments in the Olympics
one specifically on the 2012 Olympics so
on and so forth but these are specific
to the ref text and you can imagine a
real user writing
this moving on we have another bad
example you are a 9-year-old who's picky
who's a picky eater that's going to
Japan for 2.5 weeks what's your dream
itinerary look like incredibly
broad incredibly
contrived so we can make this better by
adulating a real user becoming more
specific and becoming more grounded in
the reference text a better example
would be I'm going to Japan with my
nine-year-old daughter for two and a
half weeks she's a picky eater though
what are some good types of foods for
her that are still authentic you have a
number of constraints your nine-year-old
daughter who's a picky eater your time
frame good types of foods that are
authentic but for picky eaters would be
very specific to the reference
text an even better example I'm going to
K in Osaka a specific region with my
nine-year-old daughter hicker for two
and a half weeks what are some good
restaurants where I can get an authentic
local meal but where she can have some
options now we are going from types of
foods to specific restaurants in a very
specific area and the constraints of the
user getting authentic local
meal but the daughter having options as
a picky
eater so what do good examples look like
good examples have responses that use
the reference texts to fully answer the
questions of the
prompt example one I'm thinking about
taking my 13-year-old daughter to see
the movie Inside Out too because I think
it is about dealing with emotions during
puberty I want to have a conversation
with her
beforehand what should I say
you can imagine the reference text about
the plot about reviews perhaps even
advice on how to speak to your
13-year-old
daughter and as a result the model will
use all of this in order to answer it'll
get the context of the movie how it
relates to dealing with emotions during
puberty and perhaps information about
how to approach this conversation based
specifically on the reference text
another example is the Yankees have not
been great for the past few seasons
despite having Aon judge and Garrett
Cole why haven't they been able to live
up to their potential you would imagine
reference text here about the Yankees
recent performance perhap stats about
Aaron judge and Gary Cole analysis of
their past
Seasons you know you can really have a
lot of different analysis different
opinions as well as combining that with
facts like statistics from those past
Seasons again the model will incorporate
all of this decide what is relevant
extract it combine it use it to fully
answer and satisfy this
prit a final example based on the recent
stock market volatility and the
expectation for the FED to cut rates in
their next meeting should I be looking
to invest in more equities or treasuries
you could imagine here the reference
text would be about recent Market
movements analysis of the stock market
and perhaps news or even predictions
about the Federal
Reserve so reiterating the most
important fact and requirement about
writing a prompt in this rag project is
that it utilizes the reference text in a
very specific way here is what it will
look like you will have suggested topics
you will have the box for your prompt
and now is the part about adding
reference text I'll note that you can
actually find the reference text before
writing your prompt sometimes it is
helpful to come up with a category or a
topic maybe find reference text or find
your prompt around
them but again the model we use the
reference text to answer the prompt and
as a rule of thumb if the prompt can be
asked on any set of reference texts it
is a bad prompt like a summary I will
repeat the rule of thumb is that if a
prompt can be asked on any set of
reference text then it is a bad prompt
in order to add your reference text you
will click on the purple plus sign in
this box here to add between two and 10
reference texts and their
URLs each reference text needs to be at
least 150 words and the total length
should be between 500 and 2500
words you also May split up and reorder
your reference text what that means is
that a single URL and serve as multiple
reference texts perhaps you upload the
first paragraph and the second paragraph
as the content with the URL as one
reference text and the next one you can
upload the last paragraph and the same
URL as the next reference text these can
be formatted as either markdown or raw
formatting so now that we've The Prompt
that utilizes the reference text and the
reference text the model will give us
two
responses we will be rating those
responses on a scale of one to three
where one is major issues two is minor
issues and three is no issues on the
following five criteria ref text
grounding truthfulness help
instruction following finally writing
style and
tone we will go over the rubric now with
some
examples so what we mean by reex
grounding is the question are each of
the claims and the response grounded in
the ref text reorienting oursel to the
goal of this project is teaching the
model rag to use the reference text to
answer our question that means the model
does a good job if it is using that
information and only that information
that means the model is doing a bad job
if it is using information that is
completely unrelated and outside of the
reference text so major issues would be
if core response statements are not
directly round grounded in the reference
text minor issues would be some
statements are not directly grounded or
inferred from the reference text or
supplementary to the
response no issues would be that all the
response statements are grounded in the
reference text so let's look at an
example The Prompt here is I'm looking
for the best noise cancelling headphones
with the best sound quality what do you
recommend the reference text is this
following segment from an article about
noise cancelling
headphones imagine if this were the
model response it says the Sony wh1000
xm4 Sony
w1004 seems to be the best all-around
Choice balancing performance and value
however if budget isn't concern you want
the absolute best in noise cancellation
and sound quality the Bose quiet Comfort
Ultra headphones might be an ideal chair
Suite so how do I evaluate this I need
to go Claim by claim in this response
and make sure that it comes from the
reference text so the claim that the
Sony headphones are the best allaround
choice is in the reference text the
statement that the Sony headphones
balance performance and
value comes from the reference text on
bewall and quality design and value is a
quote however if budget isn't concern
you want the absolute
best the Bose quiet Comfort Ultra might
be ideal choice for you we can see in
the reference text the Bose quiet
Comfort Ultra headphones are the best
premium noise cancelling headphones
don't mind spending a bit extra to get
the best these offer the best do so this
would be scored a three on raex
grounding there are no issues every
statement every claim in this response
is directly grounded in the reference
text now we can look at response
B the Sony wh1000 xm4 improves on last
year's model the wh1000
xm3 this statement right here is not
directly grounded in the reference text
we can see in the reference text
that these are the newest release since
the Sony wh1000
xm3 but in nowhere in the reference text
does it say it is last year's model that
is
inferred so this would be rated A2 minor
issues on reex grounding as this claim
that it came out last year is inferred
from the reference
text so if you aren't confident that an
inference like this could be made air on
the side of caution and be strict it is
important that if you between two scores
on this project choose the lower one be
strict now let's imagine this the
response says that the Sony headphones
are the best all around Choice balancing
performance and value with a speakto
chat technology that automatically
reduces volume during
conversations that
feature is nowhere to be found in the
reference text it is completely outside
of scope hence this would score A1 or
major issues on refex grounding this is
a major fail because the core part of
that reasoning and the
response fails to use the information
from the reference text it is outside of
it another
example in this example we say however
if budget isn't a concern you want the
best go with the senheiser momentum for
wireless again this is a major issue as
these senheiser momentum for wireless
appear nowhere in the reference text it
is outside of scope there is a major
grounding
issue so now that we understand refex
grounding we can go to the next category
which is similar but it is truthfulness
which we are asking here are each of the
claims in the response correct or
incorrect so major issues would be that
one or more major claims contain
meaningful inaccuracies or unfounded
claims making it unhelpful to the
user a minor issue would be on major
claim are factual and accurate perhaps
minor claims contain meaningful
inaccuracies or unfounded claims no
issues would be all claims are accurate
based on reputable web
evidence so we'll look at an example
here the example prompt is what are the
impacts advancements in renewable energy
on the
economy here would be the reference text
response a goes into three categories by
job creation infrastructure investment
in a reduction in energy cost and we can
see the solar technology
here appears in the reference text the
wind
power also appears in the reference text
the battery storage also appears in the
reference text increase job
opportunities in
engineering
manufacturing installation roles all
appear in the reference
text the section on infrastructure
investment the growth of Ren energy
sector has stimulated investments in
infrastructure is also directly grounded
in the reference
text let us find exactly
where here stimulating investments in
infrastructure finally reduction in
energy cost decrease in energy costs
over time comes directly from the
reference tax so there are no issues
here
these are all true
statements now imagine if there was
response
B where it says advancements in
renewable energy are expected to
generate 1, 342,000 49 jobs by
2030 This Is A Minor truthfulness error
because of unverifiable claims the
number of jobs is unfounded and so while
it makes the response a little less
accurate overall it is is still somewhat
helpful to the user so this minor
unfounded caim and how you would verify
this is to Google it say you're Google
no something like uh job projections
from renewable ngg in
2030 but I'll give you a hint it is not
this incredibly specific
number so while we are talking about
truthfulness this is also not grounded
so I would also point out that we did
look at this response for groundedness
but we are in the category of
truthfulness so it is actually important
in this section to do two things one
check every single claim with the
co-pilot and with Google and I'll get
into the co-pilot
momentarily imagine here was response
C imagine the response at high energy
photovoltaic cells are built primarily
from
copper let's Google that
what are high energy photovoltaic cells
made
from Silicon by
far or high efficiency philic cells made
from copper
let's
see and here's actually a very
interesting case because the AI overview
here is
incorrect because we can go down to the
Department of energy for instance and
see that it's made out of
so in response D we have the statement
that the growth of the renewable energy
sector has caused Mass of failures in
the financial sectors in recent years
this is categorically false a major
claim one of three major claims in this
response is false so this would score a
one on
truthfulness in next criteria is
helpfulness which is how well does the
model answer the user's
request and that would be basically like
if it leaves out relevant content or
there's an excessive amount of relevant
content that would be a major issue if
it's missing a little bit of relevant
content or has a little bit of
irrelevant content that would be a minor
issue and if it is fully satisfying then
it would be no issues this is also
related to instruction following which
is how well does the model understand
the requirements of users's prompt a
major issue here would be the response
ignoring or violating key parts of the
prompt like the constraints making the
response
useless a minor issue would be that that
the response follows most to the
instructions but misses certain elements
and finally all instructions are
followed now we'll look at an
example say the prompt is to give me a
fstep recipe to bake chocolate cake if
the response is an eight-step recipe to
bake a chocolate cake that is still very
helpful there's no relevant information
missing there is no you like excessive
amount of irrelevant information in here
this is a great eight step recipe to
bake a cake
however on instruction following it
would be scored a two for minor issues
because it is eight steps instead of
five
steps I like your response B we have a
five-step recipe to bake
croissance this is incredibly unhelpful
so it's a one on helpfulness and it does
not follow the instructions at all so it
is a one as
well our final category is writing style
and tone which is just how well the
response is worded and formatted
so to reiterate it is extremely
extremely extremely important to verify
and double check every claim made in the
model response that will be used to
measure raex grounding and truthfulness
so you go Claim by claim if the claim is
in the reference
text that determines how we rate it on
ref text grounding and we would also use
Google or the factuality co-pilot which
you can see here
to verify it is
accurate so once we've rated the five
criteria for both responses we will be
selecting which one we prefer and we
will do that using a one to five lier
scale you can see here on one on the
ends of the spectrum we have the
responses being much better slightly
better and then neutral in the middle so
you can think about it like a scale or
the further away from the middle we go
the more of a difference there is so a
one would signify that response one is
much better
a two would signify that response one it
is slightly better a three would signify
that there is a neutral preference or
they're the same a four would signify
the response two is slightly better and
a five would signify response two is
much better it is extremely important to
note the
following you should only select a three
if the ratings are identical for both
responses otherwise there is a
preference when evaluating your
preference please keep in mind that the
most most important criteria are Vex
surrounding and truthfulness by far
given the context of this is a rag
project Follow by helpfulness then
instruction following and then writing
style and tone is the least
important after we give our lyer score
we will give a
justification the justification is
incredibly important for viewers the
project team and the customer to better
understand the preference here and
understand your thinking your logic and
why you chose the score you did they
should use examples and details to
highlight key differences between the
responses focusing on the most major
issues they must must must align with
your Liker score so a good practice
would be if your Liker score is one you
should start your justification with
response one is much better because your
like score is four you should start it
with response two is slightly better
because but the key here is to give
specific details examples quotes even to
explain your thought
process finally for multi-turn tasks we
will use the preferred response so for
instance if the Liker was a four two is
preferred so I'll use that response as
context for the next turn we will
continue the conversation in a way that
should flow naturally as if you were
speaking to another human or you were
speaking to a model in real life making
a
conversation it is incredibly important
to again reiterate all prompts must use
the reference
text if the next turn asks about how the
previous response relates to topics
outside of the reference text that is
that if instead it asked about different
parts of the reference text or for more
information that would be
good a bad example here is something
that is
contrived and then not conversational
goes off track and is not based on the
ref
text a good example here assuming a ref
text set about
festivals in Italy would be the first
prompt as I'm planning a trip to Italy
and I'd love to know about some popular
food festivals there term two prompt
could be give me some highlights about
the Saga Dela fuo and Alba and the third
prompt could be up debating between the
chiani class wine festival and the sagio
but can't do both help me decide
assuming that all of these festivals are
in the reference text all of these proms
would in a very specific way utilize
those reference texts and be exceptional
you repeat these steps until the desired
number of turns is
reached so please please please read
this document it is full of good
information examples on justifications
here we could
see and we also have a cheat sheet at
the top but I will broadly say again
that we are teaching this model how to
do rack to use our reference text to
answer our prompts that means the promps
need to be good they need to utilize the
reference text or else they are useless
and it is incredibly incredibly
important to go slowly and be diligent
with your ratings we can only train a
model on how to do rag if we can
correctly evaluate is it grounded in the
reference text and is it truthful so go
Claim by claim double-checking verifying
on one hand it's in the reference text
for this category on the other hand it
is true in this
category
so we look forward to working with you
on this project thank you very much for
taking the time to watch this video and
please reach out via discourse if there
are any questions
Browse More Related Video
RAG Rewrite Instructions (Outlier)
🎮 GAMING AUDIO GUIDE - Top Gaming Audio Picks at ANY Price 2024 - IEMs, Headphones, and Headsets!
ИЗУЧИТЕ ГЛАВНУЮ НЕЙРОСЕТЬ МИРА от Б до Ю
Создаем простую форму регистрации и авторизации | PHP, MYSQL, HTML
Акселерационный курс-тренинг «Дорожная карта по привлечению инвестиций». Вводное занятие
Claude 3 Haiku vs Sonnet vs Opus for Writing
5.0 / 5 (0 votes)