AI and Kotlin: A Perfect Mix | Vladislav Tankov
Summary
TLDRVladis Tankov, the lead of JetBrains, introduces AI functionalities in the context of Kotlin development at Kotlin Conf. He discusses AI's role in enhancing developer productivity through tools like Fleet, a code editor with AI-powered features such as chat, code explanations, refactoring, and completion. Tankov also delves into machine learning concepts, the importance of generalization in AI, and the practical applications of large language models in development assistance, emphasizing the balance between leveraging large models and optimizing for cost-effective inference.
Takeaways
- π§ The speaker, Vladislav Tankov, discusses the integration of AI functionalities in the development process, specifically for Kotlin developers, to enhance productivity and ease of use.
- π€ He introduces 'Fleet', a code editor that goes beyond traditional editing by incorporating AI capabilities such as chat, code explanation, refactoring, and completion.
- π The AI chat feature in Fleet is tailored to understand Kotlin, providing project-specific insights and assistance, which is powered by machine learning models.
- π The importance of 'intentions' in IDEs is highlighted, which allow developers to understand and interact with code more effectively, including features like explaining and refactoring code.
- βοΈ Automatic generation of commit messages is presented as a timesaving feature, with AI creating average commit messages that describe changes made in the code.
- π The concept of machine learning is simplified using the analogy of classifying Golden Retrievers, explaining the training process and the challenge of generalization in AI.
- π§ The significance of 'large language models' in the current AI revolution is emphasized, with their ability to encode and retrieve vast amounts of knowledge, leading to more intelligent AI behavior.
- π The role of embeddings in capturing semantic information and their use in understanding the context and relationships between different pieces of code or documentation is discussed.
- π§ The architecture of development assistants like Fleet is detailed, including the use of on-device models, context collectors, and integration with third-party large language models.
- π‘ The talk concludes with considerations on the necessity of large language models, suggesting that for specific tasks, smaller models with fine-tuning may be more cost-effective and efficient.
- π The importance of inference in the cost of AI services is underscored, with the suggestion that for general models, using existing providers is more economical than self-hosting.
Q & A
What is the main topic of Vladislav Tankov's talk at the Kotlin conference?
-The main topic of Vladislav Tankov's talk is the integration of AI functionalities, specifically focusing on how AI can make the life of Kotlin developers easier and more efficient.
What is the role of AI in enhancing the developer experience in Kotlin?
-AI plays a significant role in enhancing the developer experience by providing functionalities such as code completion, chat assistance, and automatic generation of commit messages, which can speed up development processes.
What is the significance of the 'Fleet' tool mentioned in the talk?
-Fleet is a code editor that goes beyond traditional editing by incorporating AI functionalities like chat, which can understand and assist with Kotlin-specific queries, making it a powerful tool for Kotlin developers.
What are 'intentions' in the context of AI and IDEs?
-In the context of AI and IDEs, 'intentions' refer to AI-driven actions that can explain code, refactor it, or perform other coding tasks, which can be particularly useful for understanding and improving code quality.
How does the AI model for code completion in Fleet work?
-The AI model for code completion in Fleet works by being fine-tuned and aware of Kotlin code, using context from the project to provide multi-line and single-line code completion suggestions.
What is the concept of 'generalization' in machine learning as discussed in the talk?
-Generalization in machine learning refers to the ability of a trained model to perform well on new, unseen data. It is considered the 'holy grail' of machine learning because it ensures the model can make accurate predictions beyond the specific examples it was trained on.
What is the role of 'fine-tuning' in the context of large language models?
-Fine-tuning is the process of adapting a pre-trained large language model to a new task by providing additional training on specific examples. This allows the model to become more specialized and accurate for particular applications.
Why are large language models considered expensive to use?
-Large language models are considered expensive due to the computational resources required for inference, which is the process of making predictions with the model. The cost of running these models on a large scale can be prohibitive for many applications.
How does the concept of 'embedding' help in understanding the context in AI models?
-Embedding is a vector representation of text that captures semantic information. By using embeddings, AI models can understand the similarity between different pieces of text, allowing them to better determine the relevance and context of the information they are processing.
What is the importance of 'inference' in the context of AI and its cost implications?
-Inference is the process of using a trained AI model to make predictions or decisions. It is a significant cost driver for AI services because it requires substantial computational power, especially when dealing with large language models and high volumes of requests.
How can smaller AI models be effective in specific tasks like bug detection or code completion?
-Smaller AI models can be effective in specific tasks by being trained on relevant data and fine-tuned for the task at hand. They can offer a more cost-effective solution compared to large models, especially when the task does not require the extensive knowledge and context that large models provide.
Outlines
π§ Introduction to AI and Kotlin Development Tools
Vladis Tankov, the lead of JetBrains, introduces the topic of AI in the context of Kotlin development. He discusses the applicability of AI to Kotlin, emphasizing the importance of AI functionalities in enhancing developers' productivity. Tankov mentions the integration of AI in JetBrains' tools, such as Fleet, a code editor with advanced AI features. He highlights the ability of AI to understand Kotlin-specific documentation and assist in various programming tasks, including code explanations and refactoring, using 'intentions.' Additionally, he covers the implementation of code completion and automatic commit message generation, powered by machine learning models.
π€ Understanding Machine Learning and Its Application
The speaker delves into the fundamentals of machine learning, using the analogy of classifying Golden Retrievers to explain the concept of training a model. He discusses the process of approximating an original function based on examples to create a trained function that can generalize beyond the training data. The importance of generalization in machine learning is emphasized, along with the challenges of erroneous responses due to the complexity of the trained functions. The talk also touches on the role of large datasets and the evolution of machine learning applications, such as GitHub Copilot and image generation models, which are based on language models and the principles of predicting word sequences.
π The Power of Language Models in Code Development
Language models are explained as functions that generate probabilities of word sequences based on training data, which can be applied to code development. The speaker illustrates how large datasets of Kotlin code can be used to train a model for one-line or multi-line code completion. The evolution of language models is discussed, highlighting the impact of large language models that can encode substantial knowledge and retrieve it when needed. The speaker also explains how fine-tuning these models can adapt them to specific tasks, such as generating chat responses or understanding instructions for code generation.
π Fine-Tuning and the Role of Context in AI Models
The paragraph explores the concept of fine-tuning in AI, where a trained model is adapted to new tasks with additional training on specific examples. The speaker uses the example of adapting a language model to generate text for a different breed of dog, showcasing how the model can be biased towards specific examples while retaining most of its knowledge. The importance of context in AI responses is emphasized, and the speaker discusses how additional context, such as comments or project structure, can be provided to language models to generate better code completions.
π οΈ Advanced Context Collection for Development Assistance
The speaker discusses advanced techniques for context collection to improve the performance of AI in development environments. He talks about using on-device machine learning models to understand which files are related and trim the context accordingly. The integration of additional context from the IDE, such as file names, language versions, and libraries, is highlighted as a way to provide more context to the AI model. The speaker also mentions the challenges of managing extreme context sizes and the potential solutions to this problem.
π Utilizing Embeddings for Contextual Understanding
The paragraph introduces embeddings as a method for representing text in a vector form that captures semantic information. The speaker explains how embeddings can be used to understand the similarity between texts and trim context accordingly. He also discusses the use of embeddings in Fleet's assistant for providing knowledge from Kotlin documentation, illustrating how embeddings can help in grounding techniques to provide relevant and accurate information to the AI model.
π‘ The Importance of Inference in AI Cost and Efficiency
The speaker highlights the cost and efficiency aspects of AI, focusing on the inference process as a primary cost driver for AI services. He explains that while training large models is resource-intensive, the ongoing cost of inference is even higher. The speaker suggests that for general models, it is more cost-effective to use existing providers like OpenAI, while for specific tasks, investing in smaller, more efficient models may be beneficial. The importance of optimizing inference for competitiveness in the AI industry is emphasized.
π The Architecture of AI in Development Tools
The speaker outlines the architecture of AI integration in development tools like JetBrains' Fleet. He describes the use of on-device models for tasks like one-line code completion, context collectors, and the composition of context sent to cloud APIs. The speaker also discusses the use of third-party large language model providers for specific tasks and the importance of optimizing inference costs. The architecture aims to combine local and cloud-based AI models to provide a seamless development experience.
π Conclusions and Q&A Session
The speaker concludes the presentation by summarizing the key points discussed and inviting questions from the audience. He emphasizes the importance of understanding AI's role in development tools and the practical considerations of implementing AI features. The Q&A session allows for further exploration of the topics covered, providing additional insights and clarifications.
Mindmap
Keywords
π‘AI
π‘Cotlin
π‘Code Editor
π‘Machine Learning
π‘Generalization
π‘Fine-tuning
π‘Language Model
π‘Embedding
π‘Inference
π‘Development Assistant
π‘Large Language Models (LLMs)
Highlights
Vladis Tankov, lead of JetBrains, discusses AI functionalities to enhance Kotlin developers' productivity.
Introduction to the concept of AI in the context of Kotlin and JetBrains' efforts to integrate AI for developer assistance.
Demonstration of the chat feature in Fleet, JetBrains' code editor, which leverages AI to understand and assist with Kotlin-specific queries.
Explanation of how AI can be used for code explanations, refactoring, and completion within IDEs like Fleet.
The importance of code completion in AI, with JetBrains providing a model fine-tuned for Kotlin code.
Automatic generation of commit messages by AI to save developers' time.
A conceptual definition of machine learning based on the example of classifying Golden Retrievers.
The process of training in machine learning as an approximation of human classification abilities.
The challenge of generalization in machine learning and the pursuit of accurate function approximation.
Erroneous responses in machine learning as a significant problem due to the complexity of trained functions.
The role of large language models in the AI revolution, especially in tasks like GitHub Copilot and GPT for code generation.
How language models predict the next word in a sequence, which is fundamental to features like code completion.
The significance of model size in encoding more knowledge and demonstrating higher intelligence in AI models.
Fine-tuning as a method to adapt trained models to new tasks with additional specific examples.
The practical application of fine-tuning in chat models like Chat GPT to understand and respond to user instructions.
The use of embeddings to represent text semantically and trim context for more efficient AI responses.
The concept of 'inference' in AI and its cost implications for AI services, driving the need for more efficient models.
The architecture of JetBrains' AI integration, combining local models, context collectors, and cloud APIs for optimized performance.
Transcripts
[Music]
hello everyone my name is vladis tankov
I'm the lead of J brain C and today I'll
be telling you a bit about
Ai and actually we at cotlin con so I do
expect that not everyone of you are
really interested in AI in general uh
but more about a applicability to cotlin
or how it is related and we've been
doing a lot of things with the eded
brains a lot of things has been done in
courtlan for cing so I'll be telling
about it a lot but to start with we need
to understand how they are related like
a and cotland and one of the most
important things for you as a developers
is actually AI functionalities that
makes possible for you as a cotland
developers doing things faster we are
doing a lot of things at J brain C to
make your life easier and
a lot more happy hopefully and one of
them is uh chat so I'll be showing all
the examples on fleet which is uh our
more than a code editor far more than a
code editor thing uh that you can use
actually to De develop multiplatform
quot multiplatform that has been shown
already during the keynote and one of
the cool features that you can use uh in
Fleet right now with J braci is chat
there are a lot of things to do with the
chat you've already probably seen chat
GPT or gp4 but what is important and you
may even notice it here uh is that our
chart actually knows not only about uh
World Knowledge and we'll be talking
about it bit a bit more but also it
knows a lot about cotland itself we do
upload a lot of knowledge about cotland
documentation and other things still by
the way latest release it says that is
1923 I think we didn't update it for 20
yesterday yet uh but it's pretty rare
and it really can help you to understand
what is going on in your project or help
you with some uh programming things
another important thing that you can use
with the fleet or any other existing IDE
I'll be just showing examples on fleet
is intentions for example you can ask to
explain the code and a lot of folks from
banking sector are really happy about
this thing for some reason uh because
they are able finally to understand what
code is doing and when it was was
written there are a lot of different
intentions so you can explain code you
can ask to refactor code
there have been code brushes few years
ago introduced by GitHub capalot and you
can also try complete as well another
extremely important thing that we've
released like during the keynote of cotl
con and that you can try and fleit right
now is code completion so I guess a lot
of you folks know what GitHub Capal is
and uh use code completion can you raise
a hand how many of well it's a lot more
than on J conferences um so yeah finally
uh we as Jud brains provides you with
good code codling code completion model
that provides you with multi-line code
completion with on line code completion
actually is fine-tuned and aware of
cotlin code and is using a lot of
context from your project and it really
helps you to develop it we will be
talking about it a lot during this
talk another neat thing that is for some
reason extremely popular probably
because no one likes to uh write the
commit messages is automatic generation
of commit messages well as all of the a
AI will be generating something very on
average for you so it will generate
average commit message that everyone
will be writing down but it can a lot
save your time during the generation of
commit message uh during the pushing
something and will very extensively
describe the code that you have changed
all of such things are actually working
and powered by different machine
learning models and once again since
this cotland conference I do expect that
not a lot of you folks considering
yourself machine learning engineers and
professional in this field and I'll be
telling you a bit about machine learning
how does it work and how is it related
to actually the things that you've seen
previously with the fleet example and
how it actually affects your life as a
developers and we will start actually
with a very simplistic and very
conceptual definition of what machine
learning is is based on Golden
Retrievers so let's imagine that there
is a kind of a and not even imagine
there is a thing in the world that is
called a classification of golden
retriever any of you if you know what
golden retriever is is capable of
telling whether this breed of dog is
golden retriever or not that is
so-called original function a classif
classifier function that exists in a
human nature and that helps us to
understand whether the dog is golden
retriever or not the problem with this
original function is that it's extremely
hard for us to understand how does it
work so for you to classify the dock as
a human as a golden retriever or not you
need to kind of perform a lot of brain
operations inside and without brain
research it's really hard to understand
how does it work that is why uh if we
actually want to create a classifier for
docs we will need to perform the thing
called training and all the machine
learning folks are doing is training
it's the process is of infering
approximation of original function based
on examples what does it mean we are
unable unfortunately to take something
that exists in a human in nature like an
ability to classify something and put it
inside the computer that is why we have
to approximate it and we do it with a
technique called training we create some
function some really hard function and
approximate with it the nature the
classification function to understand
whether it's golden retriever or not and
this thing is called train function
basically the whole machine learning
process is creation of something that
exists in the world
like ability to classify by the Golden
Retriever and bringing it inside the
computer with some approximation you
have a train function it works it's
great but the problem is that since you
are training it on a number of different
examples you will have to uh make sure
that it generalizes and that is actually
a holy grail of machine learning we've
taken a lot of examples of golden
retrievers and made sure that the
function that we've been training now
classifies whether it's a golden
retriever or not but the problem is that
the function that we've trained is
actually working only on a data set of
those examples and this data set of
those examples is not actually what we
are willing to have we have an examples
it tells whether it's a golden ret or
not but we actually may have just create
an if statement that we check whether
it's in data set or not the
generalization as a holy grail of
machine learning means that approximated
function actually works as we expected
so it actually tries to classify whether
it's a golden retriever or not it takes
a look at the fur it takes to look at
the things around the creature that we
are trying to classify and understand
whether it's a golden retriever not just
kind of a checks whether it's one of the
images that were on example data set if
you have a good generalization you have
actually the function that is working
that is approximating the existing
function in the world you have a human
being that is able to tell us this is a
golden retri this is case hun and now
you have a trained function that is
running inside your computer that is
also telling that is Golden Retriever
and that is not one of the biggest
problems with it is the problems of
erroneous responses since the function
that we are training is actually
extremely hard extremely hard to
understand we are not even able to
research it through the bra brain
research we are only approximating it
and it means that we will have erroneous
responses for example our function may
decide that everything that is gold and
on the grass is golden retriever so that
is also definitely a golden retri and
that is one of the biggest problems with
machine learning you have a lot of
erroneous responses and and all you do
is basically you are taking the function
you are trying to approximate it and you
are trying to make sure that the
approximation is good enough so the
whole process basically in a very on a
very conceptual level you have a you
have an original function that you don't
know how to describe in a mathematical
terms you create from it a trained
function that is approximation of the
function that exists in the nature and
actually since I'm also Elite of machine
learning team they've asked me to add
something that is little bit more uh
precise than golden retrievers so
machine learning is infer a function uh
machine learning infer a function that
connects inputs to outputs without
knowing the original function that is
more or less precise definition in
mathematical terms the question is how
is it actually related to anything you
see nowadays actually golden rets are
related a lot we'll be seeing them a lot
uh and the things that you are using
right now as GitHub Capal as a CH GPT as
GPT for all even generation of uh images
if we're talking about generations with
multim modals all of them are based on
the same principles and all are based on
the thing called language Morel so back
to dogs uh we have another function that
exists in the world that is basically a
kind of probabilistic function that
tells us what is the next world word is
each of you can take a look at the
sentence my dog is and decide what would
be the next word depending on your
previous experience you would decide
that the next dog would be for example
Cas hunt or golden retriever or any
other dog but each of you actually
inside of yourself have a probabilistic
functions that tells that depending on
this context the next word would be and
that is also the same way you are
actually writing the code so for example
we have an original function uh that
predicts next word and for me my doc is
always would be my dog is golden and
then I take the sentence my my dock is
golden I predict the next word Retriever
and that is extremely powerful concept
basically the whole AI Revolution that
is happening right now is based on this
concept obviously for probabilistic
model it would not be that easy that my
dog is always golden it will be a return
as some probabilities my dog is good my
dog is golden my dog is bad we will
still have a erronous responses like
with um like with golden retrial
classification for example I can provide
the model with the phrase my dog is
golden will tell me my dog is golden
bucket which you may see or not in gith
Capal for example uh but it still
happens another important thing that we
can predict not only next word but we
can also for example predict word in
between for for example we have my do is
golden retriever we are predict in the
middle we we deciding that my dog is
golden retriever or a good Retriever and
that is a train function so the whole
process that we've been just talking
about with the classification just
applies to the same thing we have a
function that predicts the N the next
word uh we are training the functions
that will predict this next word and it
emulates for us basically the whole
language once again from our machine
learning team language model is a
function that generates probabilities of
word sequences based on training corpora
so we have a very big corpora of
examples like data set of all the data
that exist in the world that exist in
Internet and you are just making a
function that is trained to predict
based on the context next word how it's
anyhow applied to the cotl or machine or
code or development well basically
having this concept in mind we are
already able to create one line code
completion or even multi-line code
completion because this is the same
thing when we are training the function
that is predict in the next word we
actually create the function that exists
in some data set for example data set of
all the human knowledge that has has
been at some point on the internet very
good and it generates the next words we
can take the whole data set of cotlin
code and just train the function that
predicts the next token for cotlin and
it would be actually one line code
completion so this pretty simple cont
pretty simple concept is extremely
powerful it creates from the context the
next words and with it it encodes
information it encodes a lot of things
and it actually provides you with code
completion now language models has been
existing like classifiers has been
existing I think like for 30 40 years at
least language models has been existing
also for a very big amount of time like
Mark of change and so why anything is
changing just now uh the answer is
pretty simple it's because of large
language models what is extremely
interesting is that large language
models is just a very big language
models so they are just very large that
is why they are encoding a lot more
information that is why they are working
better that is why we are all going to
chat.
open.com and why it changes anything so
language models are actually encoding
data inside them and size actually
matters a lot for knowledge if we are
taking extremely big language model it
will be able to encode a lot more
knowledge inside of it and with it it
will be able to retrieve this knowledge
and produce this knowledge back to you
for example if we take extremely big
model such as gp4 model it will be able
to tell us not only that golden
retrievers were developed by someone it
will know that it was they were
developed by Lord twt Mouse it had
happened in Scotland it happened in late
19th century because it's extremely big
moral and the whole process uh when we
are training this moral on the data set
of all human knowledge is that this
Morel is trying to approximate the
function of all human knowledge so if
the function is big enough it will be
able to approximate basically everything
we know as a Humanity right now
and it will be able to retrieve it back
and tell you when it has happened or who
did the golden retriever uh who created
the golden ret breed what is more
interesting that size Matters not only
for knowledge but so-called intelligence
and here is also spoiler a lot from our
machine learning team that uh basically
the things like intelligence knowledge
anything that I'm saying like the AI
model is trying to explain you something
tells you something or something is not
really spe like is not really precise in
mathematical sense in mathematical sense
we have approximation function that is
just generating the next toen so it
doesn't know anything it doesn't think
anything doesn't try to explain you
anything it just generates from the
context next toing but we do perceive it
as an intelligence and knowledge and
with extremely big models what we have
seen recent years is that the bigger
model you have the more intelligence you
have and with gp4 for example it's
capable of not only having a knowledge
of the Lord TW Mouse but understanding
the request so I'm asking it to write a
poem about Golden Retrievers it uses its
knowledge it retrieves its knowledge it
knows that it's playful creatures since
that is why open play it's able to
generate PS about Golden gold it knows
about Lord thread Mouse and it's even
able to generate a pound about pun about
retrievers being creatures to fetch and
carry so that is pretty good poem and
the only thing why it's able to create
it is that it's extremely large moral
that is capable of encoding a lot of
knowledge inside of it so why does it
matter because if we have not just
language model but a very big language
model large language model we're capable
of generating a lot of more code based
on the World Knowledge so I'm asking the
model to generate example of language
model usage right and since it knows
what is language model it's even capable
of retrieving some additional context
I'll be telling about it from the ID it
knows what should be generated since
it's have a lot of knowledge it knows
that language models for example most of
the time have functions that is called
predict and it's even capable of
generating some example for me and we
see that it generates a multi-line
suggestion that tells us how to generate
how to create the object of type
language model that knows what is the
function to predict and what is the
prefix so pretty
cool finally the latest thing to get us
to char GPT and other Gemini Amazon Q
whatever new AI models that are kind of
changing the world around us is the
thing called
fine-tuning so fine tuning is the
process of adapting trained model to new
task with additional training on
specific
examples and I can kind of illustrate it
with a pretty simple example I'm as a
human being have some language model
inside me which is predicting next token
and I have the language model that is
predicting that my dog is golden for
example I may decide that this language
model should be adapted to another human
being who have case h for some reason do
anyone knows what case Hunt is no no
it's it's another breit of the dog uh so
another human B being has a case horn
and I'm trying to adapt language moral
to the owner of case horn the problem is
that most likely I have a lot of
knowledge that is not changing depending
on the breed of dog I have but it needs
to generate that my do is case hunt that
is why I'm showing to this language
Model A number of examples like 20,000
or so to tell it that my do is case HT
case HS are good dogs blah blah blah
blah blah that is a fine tuned model
this fine tuned model has been biased
towards some specific examples while
retaining most of its
knowledge and now kind of a brain Splash
uh that uh chat models that you are
seeing nowadays is also language models
that are just fine tuned to generate
chat for some reason it's not really
obvious but chat models like chat GPT
gemini or anything are not actually
generating I don't know messages or
graphs or something that is the same
language model that we existed like 20
years ago they are still generating toen
by toen something and the only
difference is that it's a extremely big
model that has been trained on a very
big data set of the whole internet and
most likely everything else including
Library of Congress and then they've
been fine tuned to talk to you in a chat
uh in a chat way they know that there is
a so-called syntax that is chat ml in
case of opena models and they have been
additionally fine tuned to answer to
questions in this syntax so they've been
kind of a biased to live in the world
where we don't have just knowledge
written in all the forms but we have a
knowledge that is written as a chart and
that is why we are just moving from the
extremely simple language models that
are generating something token by token
to a very fun chat UI that responds to
you that have a knowledge that
understands your instructions and so on
and you can actually see a real example
of how it's done with chatl I think for
Lama two it
was another thing that we need to find
tune for is instruction so when we
talking about language models language
models are generating something talking
by to they don't understand actually
instructions so my doc is golden
retriever for if I will ask you to write
a poem about Golden Retriever without
additional fine tuning model will will
likely just continue the pH phrase so
write a poem about Golden Retrievers and
K horns for the owner of those creatures
instead we can additionally find tunit
to understand instructions and with it
you will have kind of uh we just create
additional data set which is extremely
precious and actually one of the all the
data sets are extremely precious for the
companies that are using them to train
the models so uh it's not that much
actually nowadays about the models a lot
more about data set so uh we train it on
additional data set that provides you an
instruction and then we generate
something in response we show it a
number of examples that has been created
by someone for example by annotators
that has been additionally hired to it
that if you have been if you get write a
poem about Golden Retriever then
generate a poem If you have been asked
how to cook something generate it if you
have been uh asked to how to cook
something illegal please don't generate
it and that is just instructions
so like taking the thing right from the
machine learnings we have an original
function that is generating something
token by token we approximate it into
train function that is also generating
something token by token we then Find
Unity to understand chart ml markup and
instructions and you get chart GPT if
you have enough money and enough G GPU
power you will get Char GPT faster
pretty
easy uh one important note actually that
for some for quite some time
unfortunately uh machine learning
Engineers have been trying to find some
kind of a more intelligent way to uh
gener to create more intelligent models
and they've been trying different
architectures well uh it seems money
solves the problem so you just need to
invest more and you'll get more
intelligent models so uh PT a moving on
to development assistants now we know
how to generate uh pretty good models
now we know what is chat uh models how
do they work and we can actually move on
to creation of development assistant and
development assisting models
and the first and the most easy model to
create is as I said dedicated language
model to generate based on prefix so we
have a prefix which is fun main WM
language while prefix blah blah blah and
we just ask it to generate uh postfix
that is good enough model unfortunately
it has some pitfalls for example you may
be generating something in between your
function so you have something up like
VM V prefix you have something down V
generation aticle expected and as a
human being you will be taking a look at
it and understand that likely we need to
uh assign here while expected but if we
are training a large language language
model that is based only on prefix it
will generate you something and it will
not know that there is expected
somewhere near somewhere above that she
uh that the language model need to take
into account so here comes fine tuning
we just take prefixes and suffixes it's
called f in the middle technique and we
do fine tune the model that initially
has been trained on for example cot data
set we do fine tune model take into
account that there is a prefix there is
a suffix and now you should continue
with the middle it basically changes the
understanding of the model of the world
around it in at all so basically now you
are not working with the text that you
do need to continue you are working with
some very strange text that it has
something called prefix then has
something called suffix and then you
need to continue middle but it makes the
morel take into account what is
happening after the middle and it will
generate better completions for you so
fill in the middle technique is very
important and with it we can actually
create the very first uh applic very
first implementation of J brain CI we
have main KT file cotland file we just
deploy some cotland language model or we
deploy it to local machine if we are
having not that big model and and then
send prefix suffix return to you the
generation and everyone is happy and I
think Capal did it like 5 years ago or
so but that is not that interesting
right so we have prefix and suffix it's
not really clear how it should take into
account the fact that I'm writing with
Scotland 2.0 which is said to have no
new features but likely we'll have them
uh it doesn't said uh say to us how to
take into account for example that I'm
writing in cater it all should be
inferred from one file so we are just
moving on to the bigger model and we
make it also generate on prefix or on
prefix and post fix but here we may
expect at least that the model will be
taken into account for example comments
because it's big enough model it will
understand natural language and will
generate something based on it with it
we already have a bit harder
infrastructure we have hosted language
model uh cot language model we have
hosted large language model we can
integrate with different third party
large language model provid us
everything is working everything is cool
but here is the question what to do next
As We Know large language models are
instructable and they are knowledgeable
so we can actually use this we can
provide a lot more to the context and
expect the model to understand it so the
easiest solution pretty obvious and
straight to the point is just to add
more comments so we as an IDE collect
automatically additional context like
what is the file name what is the
language version
which are the other files that may be
taken into account what are the I don't
know additional what are the libraries
here and so on and we do expect language
model to take it into account if we are
taking a very big lar language model
like GPT 4.0 I think it's GPT 40 or
gemini or something it has been trained
on a lot of data it understands natural
language it it is instructable it's
knowledgeable it will just understand it
so we just kind of pushing something it
and hope that it will work and it's
actually works working so with it we are
introducing a lot more context with
extremely obvious way just adding
something to comments and it will
already understand it it will already
generate better code and it will take
into account all the things that we
provided with fortunately context
doesn't end up with uh just related
files or on we have a lot of things
inside the ID that we can use to provide
additional context for example we can
use your behavior we can decide that for
example you've been taken look into the
recent file for I don't know 5 minutes
and just switch to another one and we
should add it to the context because
likely somehow it's related we can take
a look at the files that you are writing
right now but we can also take a look at
the related files in different ways
based on different distances we can take
a look at the request that you are using
you may for example not generate
something as a quote completion model
but you can provide us with additional
uh specific feedback and in Fleet it's
done with common dot you just press
common dot there is is an input field
you are writing please generate that
class and we know that you have a very
specific request we can take a look at
your project structure language
libraries and so on we have a lot of
code inside so we can take all of it and
generate additional comments or somehow
provide LGE language model with it but
there is a problem with it
unfortunately that we need somehow to
trim this context a lot of you may have
heard about the kind of a context
problem uh fortunately quite recently is
started to be being solved there is a
Gemini models that are released with
context window of few million tokens so
you can upload pretty middle siiz
projects there but the problem is that
you can actually you need also actually
to pay for it and uh we need to improve
context we need to trim it we are not
able to send for example 32k of tokens
to the model and just expect that
everything will be working we will be
paying too much we can use on device
machine learning models and here we are
getting back to gold retrievers uh
unexpectedly to understand which files
are actually related to the current file
and which are not here is extremely uh
simple idea uh we have a Docker KT which
you are editing right now and we have
docs processes and client KT uh for the
model that is based on Bings or
something else it's pretty easy to
understand that likely Dock and process
KT is somehow related because it has
been initially trained and it knows that
do c groups everything it may understand
that do and client KT is somehow related
but it will probably decide that Docker
and docks are not really related while
it can be actually wrong because it may
be watch dos still we can use on device
machine learning models to automatically
trim the context and provide B better
context to the machine learning model
and with it we get to the pretty simple
architecture scheme where we have uh on
device models that are running to
automatically rank the files that you
have right and provide them into context
we have additional context collect that
takes into account what is the project
what is the library what version of cotl
you are using it takes all of it
composes a request sends this to API and
then we decide whether it should be uh
implemented with Scotland language model
or hosted language models and that is
actually the way it works in Fleet here
is an
example you can actually try it right
now in the latest version ofle Fleet you
can just start free trial for J brain C
and it will be working and that is a big
model that that sakov has announced
during the cotlin U cotlin con keyote
that is a big model 3.7 billion
parameters that is initially just
multilanguage model then it has been
fine tuned to specifically cotlin and
then it has been fine tuned to very
specific context so we can for example
we can actually drop the size of the
model if we are fine-tuning it to a very
specific way we are providing the
context to a very specific way we are
providing the files it will not have to
be that instructable and knowledgeable
it will just know that this is the way I
see the context and I've been trained
initially on cotlin works pretty good to
try it out in C in Fleet there is still
a problem even when we can collect the
context test it's called Extreme context
sizes in a lot of cases if you are
writing something for example in intelly
plot from manaa where I will sometimes
I'm writing something there are a lot of
related files and we need some how to
understand it we can do it with on
device models but how do they work here
you may
see or I think may not see because of
contrast uh but the max actually is a
little bit brighter uh why there is a
thing called embedding embedding is the
vector representation of text that
typically captures the semantic
information what does it mean so we just
take the text get some Vector of in and
uh two different vectors will be similar
in a cin similarity or other similarity
if they are actually semantically very
similar pretty neat thing and with it
now you should see it uh we're able here
for example to understand that if we see
Val Max uh most likely the context about
Max is more or less related to the
things that I'm writing right
now we can use it to automatically trim
context context for example like we do
it for Fleet or we can do it even for
more cooler things that hopefully will
be released soon in
braci uh we can understand similarity
between the code and natural language
for example you're asking how to teiz
the text and we are checking all the
diffs uh all the depths of functions
that you have for python we get the
embeddings we automatically check which
are most similar and tell you that it
seems like split is the way to teiz the
text and tick toing most likely is also
ways toize the text and edings is also
extremely powerful concept it's already
used for you to provide uh in Fleet in a
assistant of Fleet uh the knowledge from
the cotland documentation so we
basically just index the whole cotlin
documentation that exists you are asking
the question we decide whether this is a
question about cotlin or Fleet
documentation or it's about something in
general and if it is a question about
cotl documentation we automatically find
some responses in cotl documentation and
map them that is called grounding
technique and what is important about it
is that if we are even using pretty old
model like I don't know GPT 4 that has
been released year ago not GPT 4.0 we
are able to provide the model with the
context and data that is relevant right
now so like cotlin has been released
yesterday cotlin 2.0 we do not expect
any lar large language model to know
about it because the training of the
large language model right now takes at
least three months so a lot of open AI
models will likely tell you about
cotland do zero like in September or
August where there will be new versions
of cotlin and that is why it's important
we need somehow to provide model with
the relevant and uh accurate information
and we can do it with embedding so we
index a lot of documentation we
automatically find the most relevant
pieces provided to to the model and the
model is capable of answering you a
question about what is the latest
release of cotlin what were the latest
features of cotlin and so on it can be
done actually in different ways uh I
think year ago Studio B has been
announcing that they have so-called
trusted aners and as far as I remember
it also has been done by edings but
there were a very specific list of kind
of answers and questions we are just
indexing the whole documentation and
mapping it but it's pretty cool I think
and with it we are moving to the
probably last and most important
question of this
presentation whether you actually need
large language model at all so we've
been talking about like machine learning
algorithms we've been talking about
language models this the moment where
you are just investing more money in
language moral and get bigger and better
lar language language model but the
problem is that while language lar
language models are very knowledgeable
very in very under instructable they
also extremely expensive because it's
very big model that knows everything
that is capable of doing everything it's
extremely easy nowadays to create some e
features you don't have to understand
how by torch work but at the same time
it's extremely expensive to support it
on a long run so something features
still can be implemented with a smaller
models with a fine tuning and we'll talk
about it a bit and here I will introduce
the latest the last the last and the
most probably important concept uh
during this talk that is called
inference so what is inference we had a
original function we have a train
function trained function is
approximation if we have an extremely
big language model large language model
it takes some time to actually execute
this language model upon the input that
you have and it takes takes some money
it takes a lot of money and the whole
thing right nowadays about AI is not
about actually training very big models
so gp4 is already good enough and a lot
of folks are actually right now trying
to distill and make the model smaller
while pertaining retaining their uh
qualities and accuracy because inference
nowadays is the primary driver for
primary cost driver for EI Services you
have to kind of put a lot of money into
training that is true but you have to
put a lot more money into inference
because openi or any other L provider is
actually answering to questions of
hundreds of millions people each day
they have hundred millions of responses
and it takes a lot more GPU power to
just support it and that is why nowadays
a lot of a companies are actually about
cheaper inference not about uh better
models you may see if you are taking
look at the AI scene right now that
there are a lot of AI startup that are
launching the most efficient ways of
inference for models not the most
accurate and most precise models because
this is actually the competitiveness
advantage and that is the most important
and primary cost driver for the AI
itself and what is important to know if
you actually willing to create some AI
feature uh you are willing to create
some AI powered application it's almost
impossible to beat LM providers in cost
per token for General models that is
pretty much similar thing as with Cloud
providers nowadays because AI providers
like open AI Google anthropic Amazon any
other folks they are spending their
money out of their pocket each day to
run the model they have they are
investing enormous money to try spend to
try to spend less money per day per user
if you are willing to train something
you will spend a lot of money you will
spend a lot more money to infer it so if
you actually need gp4 don't even try to
infer it to run it somewhere I don't
know in AWS it definitely will be a lot
more expensive for you than for open AA
but if you need General model you can
just take one of the existing providers
actually uh this slide was in the
previous presentation and after it I've
already uh two times rewrite Rewritten
it it was I think four months or so um
so we have open do anyone know what open
is yeah okay do anyone know what
is okay so yeah an Tropic is not that
popular uh so there is an open ey the
biggest L provider probably not
Financial wise but like the most known
one with open a CH GPT with open a GPT
40 with the different scandals about
scar Johansson so uh there are other LM
providers that are not that scandalous
like an Tropic for example uh which also
have the clo model which is pretty good
there is Google gini model there is
European provider M which is a French
based provider for large language model
there are a lot of folks and if you
actually need a general model please go
to one of the lamp providers and just
run it there you will spend a lot less
money and you will get a lot better
results than trying to run I don't know
llama uh somewhere in AWS and hoping
that it would be good enough but there
is another case if you have actually
Engineers if you have a very specific uh
task that will require from you some
machine learning and if you have some
specific reasons you may actually invest
a lot of money into machine learning
Department uh in me and uh try to create
something for example uh you can do bug
detection with uh machine learning it's
pretty important pretty known technique
but the problem is that you need to do
bug detection for all in case of J
brains for all ID users each time during
each typing the easiest way to do it
with local model because in that case
users are paying for it themselves it
doesn't uh move anywhere from their
laptop and the same time we don't have
any problems with enormous Cloud fleets
trying to uh process your requests and
the problem is L with lar language model
that there will be such more such a lot
of requests that it will cost enormous
amount and you can actually invest a lot
of money in machine learning and get
pretty decent results with a lot smaller
model that will be running a lot faster
another example that I've been talking
about is the on line code completion and
actually multi-line code completion if
you are fearless enough because it also
will cost you a lot and you don't
actually need a large language model
here like you don't need 300 or 900
billion parameter model to understand
that here most likely you need V
generation you can as we talked
previously you can actually just train
the model you can take the data set and
it will work good enough and what is
even more interesting that in a lot of
cases it will work as good as gp4
because you don't have a lot of context
here you are just kind of predict in the
next talking gp4 doesn't know a lot more
than a on line prediction model and it
will not produce better
results and with
it we are getting to the final look of
how J brain CI in Fleet and other things
are working right now so we have this
main. KT which is just file we have on
device model we even have nowadays one
line code completion model intelly which
is called Full Line code completion that
is running fully locally we have context
collectors we are using on device mail
with all of it we are composing some
context that are being sent to API in
cloud and there we have completion
models even big language models that are
answering you some questions or helping
you and we have API to S party LM
providers because as I said it's
impossible nowadays to beat quality and
the uh performance effectiveness of open
a or any other LM providers they are a
lot more in a game than we and they
spend a lot of more money each day to
each of the users that is why they are a
lot more interested in optimization of
the inference so this is the way it
works you can actually nowadays try it
in Fleet like right now and maybe I have
some questions thank you
[Applause]
yeah uh one
moment they're running with the
microphone so I have a question
regarding context collection so you had
an example with Docker KT process KT and
you mentioned bypassing that oh well is
going to know that dock KT and process
KT are related because it has some
general knowledge about cgroups and blah
blah blah but isn't it so that the
context collection that you're building
right now will use the call graph of
your applications code to collect
context actually we have a pretty
interesting architecture for example in
R sharper they are using call graph and
then they are using also embeddings to
assign weights to the call graph and
understand what can be TR because you
still have kind of a limitation in
general like 32k or 8K or 4K of tokens
so you are using call graph and then you
assign weights and trim the graph to fit
in okay but you still will be using the
call graph yeah yeah of course already
today yeah it's already today used okay
thanks thank you for the question any
other questions we have like 2 minutes
50
seconds okay so if you don't have any
other questions uh thank you and you can
also ask them in person by the stage I
guess thank you
[Music]
Browse More Related Video
Beyond the Hype: A Realistic Look at Large Language Models β’ Jodie Burchell β’ GOTO 2024
Googleβs AI Course for Beginners (in 10 minutes)!
Introduction to Large Language Models
5 Unique AI Projects (beginner to intermediate) | Python, LangChain, RAG, OpenAI, ChatGPT, ChatBot
Google Releases AI AGENT BUILDER! π€ Worth The Wait?
How I'd Learn AI in 2024 (If I Could Start Over) | Machine Learning Roadmap
5.0 / 5 (0 votes)