RAG Rewrite Instructions (Outlier)
Summary
TLDRВ этом видео представлен обзор обновленных инструкций и руководства по внедрению переписей в проект Goldfish Rag. Основная идея - улучшение обучения модели, добавляя переписи, которые исправляют несовершенства в ответах. Переписи помогают модели учиться идеалам ответов, учитывая критерии, такие как точность, полезность и приятность. Видео объясняет, когда и как следует выполнять переписи, используя диаграмму для определения случаев, требующих корректировки. Также рассматривается редактор для внесения изменений, а также процесс оформления и оправдания изменений в рамках обучения модели.
Takeaways
- 😀 В видео представлен обзор обновленных инструкций и руководства по внедрению переписей в проект Goldfish Rag.
- 🔄 Перепись означает, что модель будет обучена на основе двух предложенных вариантов ответа и их оценке по пяти критериям.
- ✅ Оценка каждого ответа будет производиться по шкале от 1 до 5 или 1 до 3, в зависимости от критерия.
- 🔍 Если предпочтительный ответ не совершенен, будет предпринята попытка переписать его, чтобы исправить возможные недостатки.
- 📝 Причины для переписывания включают проблемы с достоверностью, основанностью, полезностью или включением вежливых фраз.
- 📊 Для определения необходимости переписывания используется диаграмма-расписание, которое помогает определить, требуется ли перепись.
- ✍️ Перепись включает удаление или исправление ошибок, добавление полезной информации и устранение избыточных элементов.
- 📋 Важно сохранять соответствие между изменениями, оправданием их и оценками ответов.
- 🔗 Переписанный ответ становится контекстом для следующего взаимодействия с моделью.
- 💬 Автор видео призывает к обсуждению вопросов и предоставлению отзывов в канале Discourse для улучшения процесса обучения модели.
Q & A
Что означает термин 'перезаписи' в контексте проекта Goldfish rag?
-Перезаписи в проекте Goldfish rag означают процесс изменения и улучшения ответов модели, чтобы сделать их идеальными в соответствии с определенными критериями.
Какие основные компоненты используются для обучения модели в проекте Goldfish rag?
-Для обучения модели используются промпты, два варианта ответов модели, оценка по пяти критериям, оценка Liker от 1 до 5 и обоснование выбора.
Что делает оценка Liker в процессе обучения модели?
-Оценка Liker помогает определить, какой из двух ответов модели является лучшем, предоставляя моделью чуть более высокий балл от 1 до 5.
Почему иногда выбранный ответ требует перезаписи?
-Выбранный ответ может потребовать перезаписи, если он имеет проблемы с правдивостью, основанностью, полезностью или включает ненужные приветственные фразы.
Как определяется, требуется ли перезапись ответа?
-Требуется ли перезапись определяется на основе диаграммы процесса, которая учитывает наличие проблем с правдивостью, основанностью, полезностью, наличием приветственных фраз или других проблем.
Что такое приятности, и почему их необходимо удалить?
-Приятности - это ненужные или незначительные фразы, которые не добавляют ценности к содержанию ответа и могут быть удалены для улучшения его.
Какие изменения в процессе обучения модели были внедрены с помощью перезаписей?
-С помощью перезаписей внедрены изменения, которые включают в себя исправление ошибок, добавление полезной информации, удаление ненужных приветственных фраз и обеспечение полного соответствия запросу.
Что означает полное соответствие запросу?
-Полное соответствие запросу означает, что модельный ответ должен учитывать все ограничения и требования промпта, чтобы полностью удовлетворить его.
Какие действия предпринимаются для исправления ошибок в ответе?
-Для исправления ошибок в ответе может быть произведено удаление ложных или необоснованных заявлений, исправление основанных, добавление полезной информации из текста ссылки и улучшение стиля написания.
Чем отличается оправдание перезаписи от оправдания оценок и Liker?
-Оправдание перезаписи должно точно соответствовать тому, что было сделано в редакторе, в то время как оправдание оценок и Liker должно соответствовать выбранному ответу и его оценке.
Outlines
📝 Обновление инструкций по внедрению переписей в Goldfish Rag проект
В видео представлен обзор обновленных инструкций и руководства по внедрению переписей в Goldfish Rag проект. В рамках проекта будут использоваться переписи для улучшения обучения модели. Процесс включает предоставление модели двух вариантов ответа, оценку их по пяти критериям и выбор предпочтительного ответа с использованием шкалы от 1 до 5. В случае, если предпочтительный ответ не совершенен, будет предпринята попытка переписать его для достижения идеального результата. Важными аспектами, которые могут потребовать переписи, являются проблемы с достоверностью, основанностью, полезностью, а также присутствие вежливых фраз, которые не несут прямой информации. Перепись будет проходить в соответствии с диаграммой рабочего процесса, где будут определены случаи, когда перепись необходима.
🔍 Когда необходимо выполнять переписи и рабочий процесс
Второе видеопрезентацию посвящено вопросам, когда необходимо выполнять переписи, и подробно описан рабочий процесс. Если предпочтительный ответ имеет проблемы с основанностью, достоверностью, полезностью или включает вежливые фразы, то он подлежит переписыванию. В противном случае перепись не требуется. Процесс включает использование редактора для внесения изменений, сравнение изменений с оригиналом, предпросмотр отформатированного текста и ввод оправдания для внесенных изменений. Это поможет улучшить модель, показывая ей, как должен выглядеть идеальный ответ, и учитывать все ограничения, предъявляемые заданием.
🖋️ Редактирование и оправдание изменений в ответах
Третья часть видео сфокусирована на редактировании и оправдании изменений в ответах модели. Здесь демонстрируется использование редактора для удаления или добавления информации, улучшения стиля и полезности ответа. Важно, чтобы изменения соответствовали оценкам и оправданию, а также были основаны на исходных текстах. После редактирования и оправдания изменений переписанный ответ становится контекстом для следующего этапа обучения. Это позволяет модели лучше понимать, как должен выглядеть идеальный ответ и как улучшить свои предыдущие ответы.
📢 Заключение и обращение к участникам проекта
В заключении видео подчеркивается важность внедрения переписей для улучшения обучения модели и обеспечения качества общения. Организаторы проекта приглашают участников обратиться с вопросами, комментариями и предложениями по улучшению процесса. Было также указано, что будут предоставлены дополнительные материалы для обучения, и призыв к активному участию в обсуждениях в канале Discourse.
Mindmap
Keywords
💡перезаписи
💡критерии оценки
💡Liker
💡адекватность
💡достоверность
💡полезность
💡приятности
💡рефекс
💡форматирование
💡обоснование
Highlights
Introduction to the updated instructions for incorporating rewrites into the Goldfish rag project.
Explanation of the current model training process involving prompts, model responses, and grading on five criteria.
Introduction of the concept of 'Liker' for scoring model responses on a scale of one to five.
The new approach of rewriting responses to achieve perfection in model training.
Criteria for when rewrites are necessary, including issues with truthfulness, grounding, helpfulness, and pleasantries.
Diagram illustrating the workflow for deciding when to rewrite responses.
The process of rewriting to fix issues while maintaining the response's helpfulness and relevance.
Examples of pleasantries to be removed during rewrites.
The importance of removing references or chunks that are not directly contributing to the content.
Guidelines for fixing errors across categories and improving writing style.
The rule of thumb for deleting versus fixing content in rewrites.
Adding truthful and helpful statements from the reference text to improve responses.
The role of judgment in determining the most helpful answer for the user.
Justification for making or not making changes in the rewrite process.
Example of a sample task demonstrating the workflow for rewrites.
Use of the editor tool to make changes and the importance of the difference view for reviewing edits.
The preview feature for ensuring correct formatting of rewritten responses.
Finalizing edits and continuing with the improved response as the context for the next turn.
The significance of the rewrites in enhancing the training of large language models.
Invitation for feedback and questions regarding the new workflow and rewrite process.
Transcripts
hello everyone in this video I will be
giving an overview as to the updated
instructions and guidance regarding the
incorporation of rewrites into the
Goldfish rag project as I mentioned in a
previous webinar if you attended we are
going to start incorporating rewrites
what that means is essentially as we
currently have it we are training the
model on a few things basically we give
it a prompt we get two model
responses we grade it on five criteria
each of
them then we pick the Liker like our one
to five score on which one is better and
then our
justification so we are still going to
be doing all of this you know grading it
on our one to five score or one to five
uh one to three score on the five
criteria but let's say we have
situation basically like an example of
our Liker is four as of right now we're
training the model like here are the
good things and bad things about
response one here are the good things
and bad things about response two
however response two is still not
perfect we're giving the model a good
sense of learning what good looks like
via these ratings and via the Liker
saying this is better than this one and
basically why but it still isn't
learning what the perfect response is so
now in certain cases not always we're
going to be rewriting a new
response instead of just taking this one
where we'll make the changes to fix
certain things and make this
perfect so that is what our ultimate
workflow is going to look like we will
still have the prompt with the reference
text we're going to be given these two
different model responses we rate it
according to the five different
dimensions or
criteria we sign a Liker score which is
our preference on a 1 to five scale and
then that gives us the preferred model
response if the preferred model response
is good or perfect then we keep it but
now sometimes we are going to need to
rewrite it in order to make it
perfect the reasons why we consider it
not perfect in this project is that it
has issues with truthfulness raex
grounding or
helpfulness or it is issues with with
pleasantries pleasantries are things
like certainly I'd be happy to help oh
that's
interesting things like that that are
more fluffy phrases that don't really
contribute directly to the content of
the model
response so the first question we'll go
over is when we need to do rewrites and
it's very helpful to look at this
diagram we have
here where we have the same workflow
still and this is basically like our
prompt then our
ratings and then this is like the
Liker so we'll go through the
lier we'll get to our preferred
response if the preferred response has
issues with refex grounding or issues
with truthfulness or issues with
helpfulness or it has a includes
pleasantries then we are going to
rewrite
it if none of these are true no issues
with grounding no issues with
truthfulness no issues with helpfulness
no issues with pleasantries then no
rewrite is
required and so the idea with the
rewrites is basically we are fixing the
issues that might be happening so with
the grounding issues the helpfulness
issues the pleasantries whatever it is
and here we'll post this as well in the
instructions and other training
materials but is a brief list of some of
the pleasantries we want to remove it
includes things that you know are
unnecessary as well as if it cites any
references or chunks so sometimes we'll
see something like it'll make some
claim and then it'll say reference to or
like a claim and then like chunk five
and these are things we want to also get
rid of but we will send a comprehensive
list of this as well basically overall
what we're going to be doing is just
removing the issue
so fixing the errors across any of the
categories we see if you know something
would be better in bullet points then we
can fix that and that would be an
improvement on writing style if it
doesn't fully address our question and
could be more helpful by including
additional information we'll include
that additional information to make it
more helpful if something is false
obviously we either delete it or we
correct it same thing with unverifiable
statements if statements are ungrounded
it's the same thing we either need to
make them grounded or delete it
depending on the context and then also
we need the model response to fully
satisfy the prompt given all of its
constraints so if it ignores certain
constraints we want to make sure the
model takes that into account a general
rule of thumb in terms of deleting
versus fixing is that if something is
fixable like refex grounding then you
know you make it based on the reference
text however if it's considered
Superfluous unnecessary to the response
um something that that doesn't really
provide any helpful information but is
rather irrelevant that is something that
you know could be deleted as opposed to
change um but also it's not just two
categories of changing or deleting we
also could add certain things from the
ref text truthful statements that are
helpful and satisfy the constraints of
the prompt in a style that is easy to
read and
understand so it's also a little bit up
to your judgment as to what the most
helpful answer answer is for example if
we have the prompt did Eminem publish an
album in
2022 a response without any compliment
information is yes whereas a response
with more complimentary information
which would be more helpful is yes
Eminem published the album The Eminem
Show in
2022 again what helps with our judgment
here is putting ourselves in the shoes
of the user from the users's perspective
I want to know did Eminem publish an
album in 2022 this is a bare minim
response that answers that question
whereas this is much more helpful it
doesn't say things like oh Eminem's real
name like Marshall matters and he's from
Detroit and all of that that's you know
would be a step further in terms of
unnecessary but this complimentary
information is considered
helpful you will also go through a quick
justification as to why you did or did
not make any changes but it is again
based on this
flowchart as it appears when we are
actually tasking I went ahead and
started a sample task here where I'm
asking about basically if Celsius energy
drinks are good for you I included three
reference texts I went through the
ratings and in this case you know this
is purely just an
example but I save my changes for the
ratings once I finish all of
it I have my lyer again I think in this
one I rated response one better so it's
the same thing we'd say response one is
slightly better than response to because
of all of the things test test test I
need to hit a certain number of
words so yeah again I'm pointing out
like this is not a a perfect task by any
means obviously this uh justification is
giberish uh okay so I'll have to make
one of these changes
again calling out that is not a good
justification
but the important part of what I'm
showing here is this workflow is all the
same up until this point however this
section is new this section that we come
to here mimics exactly this flowchart
this tree so it'll first ask me
if I selected a preference or no
preference no preference would be a
three on the
Liker so that is
this a preference is any
of these one two four or
five so this would mean I do have a
preference and in this case I selected
two response one is slightly better so
in this case yes I do have a preference
and that is what I will
select it will then ask me where there
issues with truthfulness refex grounding
or helpfulness in the preferred response
again their preferred response is one a
go back up no issues on refex grounding
no issues on truthfulness no issues on
helpfulness so in this case that answer
is no does it have
pleasantries and if I said no basically
like there's no rewrite but just for the
sake of this
you know I could probably find some and
we'll just say for argument sake yes
there are Pleasant trees so this is the
situation for our tree that this is a no
we basically go here the Liker to this
there were no issues with that but it
does have pleasantries so we're going to
be rewriting
it and so here's what it looks like you
first have this editor which is very
helpful and the workflow here is
basically you make your edits and so
let's say I want to delete this
bit I want to say
like
um instead of however there are
however important
points to consider
I'll maybe add a section here
like
taste
it it uh
tastes
good um I'll get rid of like the in
summary let's say and say like overall
it is a good choice again I'm not this
is not a good rewrite I am showing for
the sake of the next step of the
workflow which is the difference
the difference is extremely helpful
especially after you do your Editor to
show what you have changed you could see
I deleted all of this stuff on the left
it is now in red and not present on the
right I made this change where I changed
however Point consider these points to
this
sentence it shows I added the section on
taste and it shows I you know deleted
and then changed the last paragraph and
it can also give me a preview of how it
actually renders you can see here here
it does the latch formatting and what I
mean by that is these double asterisks
mean it is
bolded and the same way like these
hyphens turn into bullet points so you
can that's the workflow is that you go
through the editor which now is small
but it won't be um you go to the
difference to make sure your changes
make sense and you can see it side by
side and then a preview to make sure it
is formatted
correctly then you will finish editing
in this instance I may have not removed
all of the
pleasantries but I'll basically give a
quick justification like I made the
response more
helpful by adding a section about taste
I also removed pleasantries and
irrelevant information
about
packaging and and like whatever else you
did and I'll say like okay I didn't fix
truthfulness here but I fixed
helpfulness I fixed writing style and
pleasantries so it's important that
these all match it's the same way that
your justification needs to match your
ratings and your Liker
Above This section needs to match what
you say you
did and it also needs to match what you
actually did above
so when you're done with that you'll
click save and
continue and now this is the preferred
response so the response I edited the
quote unquote perfect response is now
the context of the next turn so we can
see that this is the one I rrrote it has
this you know poorly formatted taste
thing I made this the last paragraph I
deleted things up here I changed this
and so you can see that the Rewritten
response is now the context so I can
follow up you know potentially ask like
about the
taste even though that didn't exist in
the previous model response it exists
now because I rewrote
it so it is fair to ask
about so that is what the tasking will
look like it is a similar flow again we
have the overall of Just The Prompt
we'll rate our two model
responses we'll give it a Liker
a
justification and then we potentially
rewrite so either we take the preferred
response we maybe change it and then
that's going to be the context for the
next
turn and again like we're doing a lot of
helpful things for the training we're
giving it a realistic prompt we're
scoring it on those five criteria so
we're telling it what each of these did
well well and what maybe it didn't do
well which one of these is better and by
how much with the lyer we're explaining
it but now we're taking it a step
further in how well we can train this
large language model or llm and how to
do rag by showing it here's what perfect
looks
like when we're training these models
we're often giving it an internal sense
of what good looks like and you know you
do that for a variety of things like
these 1 to three ratings and the lier
but now we're really leveling up in
terms of
training I know that this is a
relatively large change so I appreciate
everyone bearing with us as we work
through it please do feel more than free
to ask any questions any comments
concerns feedback uh any issues you
might be having ideas in the discourse
Channel as we be happy to look at it
happy to assist we will also have
additional training materials uh to be
provided with respect to re
um so make sure to check those
out but this video is just a highle
overview as to the change uh the
rewrites the workflow and what the task
looks
like so thank you for taking the time to
watch this thank you for all the great
work on the project so far and do reach
out with any questions or comments
Browse More Related Video
STALKER Anomaly Basic Addon Tutorial - Creating/Editing Weapons
Goldfish Bowl RAG Intro
Как выбрать КОНДИЦИОНЕР в 2024 году? / Топ-6 ЛУЧШИХ кондиционеров
Audio Data Augmentation Is All You Need
⭐ ВАШ САЙТ СКУЧЕН? ЭТОТ ЭФФЕКТ В ELEMENTOR ЗАСТАВИТ ПОСЕТИТЕЛЕЙ ВОСХИЩАТЬСЯ | УРОКИ WORDPRESS
ЭКОНОМИЯ ВРЕМЕНИ И ДЕНЕГ ПРИ ВЫБОРЕ НЕЙРОСЕТИ (МОДЕЛИ ТИПА LLM)
5.0 / 5 (0 votes)