AI and Kotlin: A Perfect Mix | Vladislav Tankov

Kotlin by JetBrains
29 Jun 202442:40

Summary

TLDRVladis Tankov, the lead of JetBrains, introduces AI functionalities in the context of Kotlin development at Kotlin Conf. He discusses AI's role in enhancing developer productivity through tools like Fleet, a code editor with AI-powered features such as chat, code explanations, refactoring, and completion. Tankov also delves into machine learning concepts, the importance of generalization in AI, and the practical applications of large language models in development assistance, emphasizing the balance between leveraging large models and optimizing for cost-effective inference.

Takeaways

  • 🧠 The speaker, Vladislav Tankov, discusses the integration of AI functionalities in the development process, specifically for Kotlin developers, to enhance productivity and ease of use.
  • πŸ€– He introduces 'Fleet', a code editor that goes beyond traditional editing by incorporating AI capabilities such as chat, code explanation, refactoring, and completion.
  • πŸ” The AI chat feature in Fleet is tailored to understand Kotlin, providing project-specific insights and assistance, which is powered by machine learning models.
  • πŸ“ˆ The importance of 'intentions' in IDEs is highlighted, which allow developers to understand and interact with code more effectively, including features like explaining and refactoring code.
  • ✍️ Automatic generation of commit messages is presented as a timesaving feature, with AI creating average commit messages that describe changes made in the code.
  • πŸ“š The concept of machine learning is simplified using the analogy of classifying Golden Retrievers, explaining the training process and the challenge of generalization in AI.
  • 🧠 The significance of 'large language models' in the current AI revolution is emphasized, with their ability to encode and retrieve vast amounts of knowledge, leading to more intelligent AI behavior.
  • πŸ”— The role of embeddings in capturing semantic information and their use in understanding the context and relationships between different pieces of code or documentation is discussed.
  • πŸ”§ The architecture of development assistants like Fleet is detailed, including the use of on-device models, context collectors, and integration with third-party large language models.
  • πŸ’‘ The talk concludes with considerations on the necessity of large language models, suggesting that for specific tasks, smaller models with fine-tuning may be more cost-effective and efficient.
  • πŸš€ The importance of inference in the cost of AI services is underscored, with the suggestion that for general models, using existing providers is more economical than self-hosting.

Q & A

  • What is the main topic of Vladislav Tankov's talk at the Kotlin conference?

    -The main topic of Vladislav Tankov's talk is the integration of AI functionalities, specifically focusing on how AI can make the life of Kotlin developers easier and more efficient.

  • What is the role of AI in enhancing the developer experience in Kotlin?

    -AI plays a significant role in enhancing the developer experience by providing functionalities such as code completion, chat assistance, and automatic generation of commit messages, which can speed up development processes.

  • What is the significance of the 'Fleet' tool mentioned in the talk?

    -Fleet is a code editor that goes beyond traditional editing by incorporating AI functionalities like chat, which can understand and assist with Kotlin-specific queries, making it a powerful tool for Kotlin developers.

  • What are 'intentions' in the context of AI and IDEs?

    -In the context of AI and IDEs, 'intentions' refer to AI-driven actions that can explain code, refactor it, or perform other coding tasks, which can be particularly useful for understanding and improving code quality.

  • How does the AI model for code completion in Fleet work?

    -The AI model for code completion in Fleet works by being fine-tuned and aware of Kotlin code, using context from the project to provide multi-line and single-line code completion suggestions.

  • What is the concept of 'generalization' in machine learning as discussed in the talk?

    -Generalization in machine learning refers to the ability of a trained model to perform well on new, unseen data. It is considered the 'holy grail' of machine learning because it ensures the model can make accurate predictions beyond the specific examples it was trained on.

  • What is the role of 'fine-tuning' in the context of large language models?

    -Fine-tuning is the process of adapting a pre-trained large language model to a new task by providing additional training on specific examples. This allows the model to become more specialized and accurate for particular applications.

  • Why are large language models considered expensive to use?

    -Large language models are considered expensive due to the computational resources required for inference, which is the process of making predictions with the model. The cost of running these models on a large scale can be prohibitive for many applications.

  • How does the concept of 'embedding' help in understanding the context in AI models?

    -Embedding is a vector representation of text that captures semantic information. By using embeddings, AI models can understand the similarity between different pieces of text, allowing them to better determine the relevance and context of the information they are processing.

  • What is the importance of 'inference' in the context of AI and its cost implications?

    -Inference is the process of using a trained AI model to make predictions or decisions. It is a significant cost driver for AI services because it requires substantial computational power, especially when dealing with large language models and high volumes of requests.

  • How can smaller AI models be effective in specific tasks like bug detection or code completion?

    -Smaller AI models can be effective in specific tasks by being trained on relevant data and fine-tuned for the task at hand. They can offer a more cost-effective solution compared to large models, especially when the task does not require the extensive knowledge and context that large models provide.

Outlines

00:00

🧠 Introduction to AI and Kotlin Development Tools

Vladis Tankov, the lead of JetBrains, introduces the topic of AI in the context of Kotlin development. He discusses the applicability of AI to Kotlin, emphasizing the importance of AI functionalities in enhancing developers' productivity. Tankov mentions the integration of AI in JetBrains' tools, such as Fleet, a code editor with advanced AI features. He highlights the ability of AI to understand Kotlin-specific documentation and assist in various programming tasks, including code explanations and refactoring, using 'intentions.' Additionally, he covers the implementation of code completion and automatic commit message generation, powered by machine learning models.

05:01

πŸ€– Understanding Machine Learning and Its Application

The speaker delves into the fundamentals of machine learning, using the analogy of classifying Golden Retrievers to explain the concept of training a model. He discusses the process of approximating an original function based on examples to create a trained function that can generalize beyond the training data. The importance of generalization in machine learning is emphasized, along with the challenges of erroneous responses due to the complexity of the trained functions. The talk also touches on the role of large datasets and the evolution of machine learning applications, such as GitHub Copilot and image generation models, which are based on language models and the principles of predicting word sequences.

10:02

πŸ“š The Power of Language Models in Code Development

Language models are explained as functions that generate probabilities of word sequences based on training data, which can be applied to code development. The speaker illustrates how large datasets of Kotlin code can be used to train a model for one-line or multi-line code completion. The evolution of language models is discussed, highlighting the impact of large language models that can encode substantial knowledge and retrieve it when needed. The speaker also explains how fine-tuning these models can adapt them to specific tasks, such as generating chat responses or understanding instructions for code generation.

15:02

πŸ” Fine-Tuning and the Role of Context in AI Models

The paragraph explores the concept of fine-tuning in AI, where a trained model is adapted to new tasks with additional training on specific examples. The speaker uses the example of adapting a language model to generate text for a different breed of dog, showcasing how the model can be biased towards specific examples while retaining most of its knowledge. The importance of context in AI responses is emphasized, and the speaker discusses how additional context, such as comments or project structure, can be provided to language models to generate better code completions.

20:03

πŸ› οΈ Advanced Context Collection for Development Assistance

The speaker discusses advanced techniques for context collection to improve the performance of AI in development environments. He talks about using on-device machine learning models to understand which files are related and trim the context accordingly. The integration of additional context from the IDE, such as file names, language versions, and libraries, is highlighted as a way to provide more context to the AI model. The speaker also mentions the challenges of managing extreme context sizes and the potential solutions to this problem.

25:03

πŸ”— Utilizing Embeddings for Contextual Understanding

The paragraph introduces embeddings as a method for representing text in a vector form that captures semantic information. The speaker explains how embeddings can be used to understand the similarity between texts and trim context accordingly. He also discusses the use of embeddings in Fleet's assistant for providing knowledge from Kotlin documentation, illustrating how embeddings can help in grounding techniques to provide relevant and accurate information to the AI model.

30:04

πŸ’‘ The Importance of Inference in AI Cost and Efficiency

The speaker highlights the cost and efficiency aspects of AI, focusing on the inference process as a primary cost driver for AI services. He explains that while training large models is resource-intensive, the ongoing cost of inference is even higher. The speaker suggests that for general models, it is more cost-effective to use existing providers like OpenAI, while for specific tasks, investing in smaller, more efficient models may be beneficial. The importance of optimizing inference for competitiveness in the AI industry is emphasized.

35:04

🌐 The Architecture of AI in Development Tools

The speaker outlines the architecture of AI integration in development tools like JetBrains' Fleet. He describes the use of on-device models for tasks like one-line code completion, context collectors, and the composition of context sent to cloud APIs. The speaker also discusses the use of third-party large language model providers for specific tasks and the importance of optimizing inference costs. The architecture aims to combine local and cloud-based AI models to provide a seamless development experience.

40:05

πŸ™Œ Conclusions and Q&A Session

The speaker concludes the presentation by summarizing the key points discussed and inviting questions from the audience. He emphasizes the importance of understanding AI's role in development tools and the practical considerations of implementing AI features. The Q&A session allows for further exploration of the topics covered, providing additional insights and clarifications.

Mindmap

Keywords

πŸ’‘AI

AI, or Artificial Intelligence, refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. In the video, AI is central to the discussion of how it can enhance the functionality of tools for developers, such as code editors and IDEs, by providing features like chat support, code completion, and understanding project context.

πŸ’‘Cotlin

Cotlin is a statically typed programming language that runs on the Java virtual machine and is interoperable with Java. The video discusses the applicability of AI in the context of Kotlin development, highlighting how AI functionalities can assist Kotlin developers in creating code more efficiently through features like code completion and chatbots that understand Kotlin documentation.

πŸ’‘Code Editor

A code editor is a type of program that developers use to write and manipulate source code for software applications. In the video, the speaker mentions 'Fleet,' which is described as more than a code editor, indicating that it has advanced features powered by AI that go beyond basic text editing to assist in multiplatform development.

πŸ’‘Machine Learning

Machine learning is a subset of AI that allows systems to learn and improve from experience without being explicitly programmed. The video explains how machine learning models are used in the development tools to understand code, predict next steps, and generate commit messages, making the development process more efficient.

πŸ’‘Generalization

Generalization in machine learning refers to the ability of a model to make predictions for new, unseen data. The video emphasizes the importance of generalization in ensuring that machine learning models perform well on a variety of data, not just the data they were trained on, which is crucial for their applicability in real-world scenarios.

πŸ’‘Fine-tuning

Fine-tuning is the process of adapting a machine learning model to a specific task by retraining it on a smaller dataset. The video script discusses how fine-tuning is used to make language models more specialized, such as adapting a general chatbot to understand and respond to specific programming-related queries.

πŸ’‘Language Model

A language model is a type of machine learning model that is used for understanding and predicting natural language. The video describes how language models are fundamental to features like code completion and chatbots, which predict the next word or code snippet based on the context provided by the user.

πŸ’‘Embedding

In the context of the video, embedding refers to the process of converting text into a numerical format that can be understood by machine learning models. Embeddings capture semantic information, allowing models to understand the similarity between different pieces of text, which is used for features like context-aware code completion.

πŸ’‘Inference

Inference in AI refers to the process of deriving conclusions from the data that a machine learning model has learned. The video discusses how inference can be computationally expensive, especially for large models, and that optimizing inference is a key challenge in deploying AI services.

πŸ’‘Development Assistant

A development assistant in the video refers to AI-powered tools that aid developers in their work, such as by providing code suggestions, explaining code, or generating commit messages. The script describes how these assistants can be created using a combination of large language models and fine-tuning techniques.

πŸ’‘Large Language Models (LLMs)

Large Language Models, often abbreviated as LLMs, are AI models with a vast number of parameters, enabling them to process and generate human-like text. The video script mentions LLMs as the backbone of advanced AI features in development tools, such as understanding complex codebases and providing detailed explanations.

Highlights

Vladis Tankov, lead of JetBrains, discusses AI functionalities to enhance Kotlin developers' productivity.

Introduction to the concept of AI in the context of Kotlin and JetBrains' efforts to integrate AI for developer assistance.

Demonstration of the chat feature in Fleet, JetBrains' code editor, which leverages AI to understand and assist with Kotlin-specific queries.

Explanation of how AI can be used for code explanations, refactoring, and completion within IDEs like Fleet.

The importance of code completion in AI, with JetBrains providing a model fine-tuned for Kotlin code.

Automatic generation of commit messages by AI to save developers' time.

A conceptual definition of machine learning based on the example of classifying Golden Retrievers.

The process of training in machine learning as an approximation of human classification abilities.

The challenge of generalization in machine learning and the pursuit of accurate function approximation.

Erroneous responses in machine learning as a significant problem due to the complexity of trained functions.

The role of large language models in the AI revolution, especially in tasks like GitHub Copilot and GPT for code generation.

How language models predict the next word in a sequence, which is fundamental to features like code completion.

The significance of model size in encoding more knowledge and demonstrating higher intelligence in AI models.

Fine-tuning as a method to adapt trained models to new tasks with additional specific examples.

The practical application of fine-tuning in chat models like Chat GPT to understand and respond to user instructions.

The use of embeddings to represent text semantically and trim context for more efficient AI responses.

The concept of 'inference' in AI and its cost implications for AI services, driving the need for more efficient models.

The architecture of JetBrains' AI integration, combining local models, context collectors, and cloud APIs for optimized performance.

Transcripts

play00:06

[Music]

play00:11

hello everyone my name is vladis tankov

play00:13

I'm the lead of J brain C and today I'll

play00:16

be telling you a bit about

play00:18

Ai and actually we at cotlin con so I do

play00:21

expect that not everyone of you are

play00:23

really interested in AI in general uh

play00:26

but more about a applicability to cotlin

play00:29

or how it is related and we've been

play00:31

doing a lot of things with the eded

play00:33

brains a lot of things has been done in

play00:35

courtlan for cing so I'll be telling

play00:38

about it a lot but to start with we need

play00:42

to understand how they are related like

play00:44

a and cotland and one of the most

play00:46

important things for you as a developers

play00:48

is actually AI functionalities that

play00:51

makes possible for you as a cotland

play00:53

developers doing things faster we are

play00:55

doing a lot of things at J brain C to

play00:58

make your life easier and

play01:00

a lot more happy hopefully and one of

play01:03

them is uh chat so I'll be showing all

play01:05

the examples on fleet which is uh our

play01:09

more than a code editor far more than a

play01:11

code editor thing uh that you can use

play01:13

actually to De develop multiplatform

play01:15

quot multiplatform that has been shown

play01:17

already during the keynote and one of

play01:19

the cool features that you can use uh in

play01:22

Fleet right now with J braci is chat

play01:26

there are a lot of things to do with the

play01:27

chat you've already probably seen chat

play01:29

GPT or gp4 but what is important and you

play01:32

may even notice it here uh is that our

play01:35

chart actually knows not only about uh

play01:38

World Knowledge and we'll be talking

play01:39

about it bit a bit more but also it

play01:42

knows a lot about cotland itself we do

play01:45

upload a lot of knowledge about cotland

play01:47

documentation and other things still by

play01:49

the way latest release it says that is

play01:52

1923 I think we didn't update it for 20

play01:55

yesterday yet uh but it's pretty rare

play01:59

and it really can help you to understand

play02:00

what is going on in your project or help

play02:02

you with some uh programming things

play02:05

another important thing that you can use

play02:07

with the fleet or any other existing IDE

play02:09

I'll be just showing examples on fleet

play02:11

is intentions for example you can ask to

play02:14

explain the code and a lot of folks from

play02:16

banking sector are really happy about

play02:17

this thing for some reason uh because

play02:20

they are able finally to understand what

play02:22

code is doing and when it was was

play02:25

written there are a lot of different

play02:27

intentions so you can explain code you

play02:28

can ask to refactor code

play02:30

there have been code brushes few years

play02:31

ago introduced by GitHub capalot and you

play02:33

can also try complete as well another

play02:36

extremely important thing that we've

play02:38

released like during the keynote of cotl

play02:40

con and that you can try and fleit right

play02:42

now is code completion so I guess a lot

play02:46

of you folks know what GitHub Capal is

play02:48

and uh use code completion can you raise

play02:50

a hand how many of well it's a lot more

play02:55

than on J conferences um so yeah finally

play02:59

uh we as Jud brains provides you with

play03:01

good code codling code completion model

play03:04

that provides you with multi-line code

play03:05

completion with on line code completion

play03:07

actually is fine-tuned and aware of

play03:09

cotlin code and is using a lot of

play03:12

context from your project and it really

play03:13

helps you to develop it we will be

play03:15

talking about it a lot during this

play03:17

talk another neat thing that is for some

play03:21

reason extremely popular probably

play03:22

because no one likes to uh write the

play03:25

commit messages is automatic generation

play03:27

of commit messages well as all of the a

play03:30

AI will be generating something very on

play03:32

average for you so it will generate

play03:34

average commit message that everyone

play03:36

will be writing down but it can a lot

play03:40

save your time during the generation of

play03:42

commit message uh during the pushing

play03:44

something and will very extensively

play03:47

describe the code that you have changed

play03:50

all of such things are actually working

play03:53

and powered by different machine

play03:54

learning models and once again since

play03:56

this cotland conference I do expect that

play03:58

not a lot of you folks considering

play04:00

yourself machine learning engineers and

play04:02

professional in this field and I'll be

play04:04

telling you a bit about machine learning

play04:06

how does it work and how is it related

play04:08

to actually the things that you've seen

play04:10

previously with the fleet example and

play04:12

how it actually affects your life as a

play04:14

developers and we will start actually

play04:17

with a very simplistic and very

play04:20

conceptual definition of what machine

play04:21

learning is is based on Golden

play04:25

Retrievers so let's imagine that there

play04:28

is a kind of a and not even imagine

play04:31

there is a thing in the world that is

play04:33

called a classification of golden

play04:34

retriever any of you if you know what

play04:36

golden retriever is is capable of

play04:39

telling whether this breed of dog is

play04:40

golden retriever or not that is

play04:42

so-called original function a classif

play04:44

classifier function that exists in a

play04:46

human nature and that helps us to

play04:49

understand whether the dog is golden

play04:51

retriever or not the problem with this

play04:53

original function is that it's extremely

play04:55

hard for us to understand how does it

play04:58

work so for you to classify the dock as

play05:01

a human as a golden retriever or not you

play05:03

need to kind of perform a lot of brain

play05:05

operations inside and without brain

play05:07

research it's really hard to understand

play05:09

how does it work that is why uh if we

play05:12

actually want to create a classifier for

play05:14

docs we will need to perform the thing

play05:16

called training and all the machine

play05:17

learning folks are doing is training

play05:20

it's the process is of infering

play05:22

approximation of original function based

play05:24

on examples what does it mean we are

play05:26

unable unfortunately to take something

play05:28

that exists in a human in nature like an

play05:31

ability to classify something and put it

play05:34

inside the computer that is why we have

play05:36

to approximate it and we do it with a

play05:38

technique called training we create some

play05:40

function some really hard function and

play05:42

approximate with it the nature the

play05:45

classification function to understand

play05:47

whether it's golden retriever or not and

play05:49

this thing is called train function

play05:52

basically the whole machine learning

play05:53

process is creation of something that

play05:56

exists in the world

play05:57

like ability to classify by the Golden

play06:00

Retriever and bringing it inside the

play06:02

computer with some approximation you

play06:04

have a train function it works it's

play06:06

great but the problem is that since you

play06:08

are training it on a number of different

play06:11

examples you will have to uh make sure

play06:15

that it generalizes and that is actually

play06:17

a holy grail of machine learning we've

play06:20

taken a lot of examples of golden

play06:22

retrievers and made sure that the

play06:24

function that we've been training now

play06:25

classifies whether it's a golden

play06:27

retriever or not but the problem is that

play06:30

the function that we've trained is

play06:32

actually working only on a data set of

play06:34

those examples and this data set of

play06:36

those examples is not actually what we

play06:38

are willing to have we have an examples

play06:40

it tells whether it's a golden ret or

play06:42

not but we actually may have just create

play06:44

an if statement that we check whether

play06:46

it's in data set or not the

play06:48

generalization as a holy grail of

play06:49

machine learning means that approximated

play06:52

function actually works as we expected

play06:54

so it actually tries to classify whether

play06:56

it's a golden retriever or not it takes

play06:58

a look at the fur it takes to look at

play07:00

the things around the creature that we

play07:02

are trying to classify and understand

play07:04

whether it's a golden retriever not just

play07:06

kind of a checks whether it's one of the

play07:07

images that were on example data set if

play07:10

you have a good generalization you have

play07:12

actually the function that is working

play07:14

that is approximating the existing

play07:16

function in the world you have a human

play07:18

being that is able to tell us this is a

play07:20

golden retri this is case hun and now

play07:22

you have a trained function that is

play07:23

running inside your computer that is

play07:25

also telling that is Golden Retriever

play07:27

and that is not one of the biggest

play07:29

problems with it is the problems of

play07:32

erroneous responses since the function

play07:34

that we are training is actually

play07:36

extremely hard extremely hard to

play07:39

understand we are not even able to

play07:40

research it through the bra brain

play07:42

research we are only approximating it

play07:44

and it means that we will have erroneous

play07:46

responses for example our function may

play07:48

decide that everything that is gold and

play07:49

on the grass is golden retriever so that

play07:52

is also definitely a golden retri and

play07:55

that is one of the biggest problems with

play07:56

machine learning you have a lot of

play07:58

erroneous responses and and all you do

play08:00

is basically you are taking the function

play08:02

you are trying to approximate it and you

play08:04

are trying to make sure that the

play08:06

approximation is good enough so the

play08:09

whole process basically in a very on a

play08:11

very conceptual level you have a you

play08:13

have an original function that you don't

play08:15

know how to describe in a mathematical

play08:17

terms you create from it a trained

play08:19

function that is approximation of the

play08:21

function that exists in the nature and

play08:24

actually since I'm also Elite of machine

play08:26

learning team they've asked me to add

play08:28

something that is little bit more uh

play08:31

precise than golden retrievers so

play08:33

machine learning is infer a function uh

play08:35

machine learning infer a function that

play08:37

connects inputs to outputs without

play08:39

knowing the original function that is

play08:41

more or less precise definition in

play08:43

mathematical terms the question is how

play08:45

is it actually related to anything you

play08:47

see nowadays actually golden rets are

play08:50

related a lot we'll be seeing them a lot

play08:53

uh and the things that you are using

play08:55

right now as GitHub Capal as a CH GPT as

play08:58

GPT for all even generation of uh images

play09:03

if we're talking about generations with

play09:05

multim modals all of them are based on

play09:07

the same principles and all are based on

play09:10

the thing called language Morel so back

play09:13

to dogs uh we have another function that

play09:17

exists in the world that is basically a

play09:21

kind of probabilistic function that

play09:23

tells us what is the next world word is

play09:26

each of you can take a look at the

play09:28

sentence my dog is and decide what would

play09:30

be the next word depending on your

play09:33

previous experience you would decide

play09:35

that the next dog would be for example

play09:37

Cas hunt or golden retriever or any

play09:39

other dog but each of you actually

play09:42

inside of yourself have a probabilistic

play09:44

functions that tells that depending on

play09:46

this context the next word would be and

play09:49

that is also the same way you are

play09:50

actually writing the code so for example

play09:53

we have an original function uh that

play09:56

predicts next word and for me my doc is

play09:59

always would be my dog is golden and

play10:01

then I take the sentence my my dock is

play10:03

golden I predict the next word Retriever

play10:06

and that is extremely powerful concept

play10:09

basically the whole AI Revolution that

play10:11

is happening right now is based on this

play10:14

concept obviously for probabilistic

play10:17

model it would not be that easy that my

play10:19

dog is always golden it will be a return

play10:20

as some probabilities my dog is good my

play10:22

dog is golden my dog is bad we will

play10:25

still have a erronous responses like

play10:27

with um like with golden retrial

play10:30

classification for example I can provide

play10:32

the model with the phrase my dog is

play10:34

golden will tell me my dog is golden

play10:36

bucket which you may see or not in gith

play10:38

Capal for example uh but it still

play10:41

happens another important thing that we

play10:43

can predict not only next word but we

play10:46

can also for example predict word in

play10:48

between for for example we have my do is

play10:50

golden retriever we are predict in the

play10:53

middle we we deciding that my dog is

play10:56

golden retriever or a good Retriever and

play10:58

that is a train function so the whole

play11:00

process that we've been just talking

play11:01

about with the classification just

play11:03

applies to the same thing we have a

play11:06

function that predicts the N the next

play11:08

word uh we are training the functions

play11:11

that will predict this next word and it

play11:13

emulates for us basically the whole

play11:15

language once again from our machine

play11:17

learning team language model is a

play11:18

function that generates probabilities of

play11:20

word sequences based on training corpora

play11:21

so we have a very big corpora of

play11:23

examples like data set of all the data

play11:26

that exist in the world that exist in

play11:27

Internet and you are just making a

play11:30

function that is trained to predict

play11:31

based on the context next word how it's

play11:35

anyhow applied to the cotl or machine or

play11:39

code or development well basically

play11:41

having this concept in mind we are

play11:43

already able to create one line code

play11:45

completion or even multi-line code

play11:46

completion because this is the same

play11:48

thing when we are training the function

play11:51

that is predict in the next word we

play11:53

actually create the function that exists

play11:55

in some data set for example data set of

play11:57

all the human knowledge that has has

play11:59

been at some point on the internet very

play12:01

good and it generates the next words we

play12:05

can take the whole data set of cotlin

play12:06

code and just train the function that

play12:08

predicts the next token for cotlin and

play12:11

it would be actually one line code

play12:13

completion so this pretty simple cont

play12:16

pretty simple concept is extremely

play12:18

powerful it creates from the context the

play12:20

next words and with it it encodes

play12:23

information it encodes a lot of things

play12:25

and it actually provides you with code

play12:27

completion now language models has been

play12:30

existing like classifiers has been

play12:32

existing I think like for 30 40 years at

play12:35

least language models has been existing

play12:37

also for a very big amount of time like

play12:39

Mark of change and so why anything is

play12:42

changing just now uh the answer is

play12:45

pretty simple it's because of large

play12:47

language models what is extremely

play12:50

interesting is that large language

play12:51

models is just a very big language

play12:53

models so they are just very large that

play12:56

is why they are encoding a lot more

play12:57

information that is why they are working

play12:59

better that is why we are all going to

play13:01

chat.

play13:03

open.com and why it changes anything so

play13:06

language models are actually encoding

play13:08

data inside them and size actually

play13:11

matters a lot for knowledge if we are

play13:13

taking extremely big language model it

play13:15

will be able to encode a lot more

play13:17

knowledge inside of it and with it it

play13:19

will be able to retrieve this knowledge

play13:22

and produce this knowledge back to you

play13:24

for example if we take extremely big

play13:26

model such as gp4 model it will be able

play13:29

to tell us not only that golden

play13:31

retrievers were developed by someone it

play13:33

will know that it was they were

play13:35

developed by Lord twt Mouse it had

play13:37

happened in Scotland it happened in late

play13:39

19th century because it's extremely big

play13:42

moral and the whole process uh when we

play13:44

are training this moral on the data set

play13:46

of all human knowledge is that this

play13:48

Morel is trying to approximate the

play13:50

function of all human knowledge so if

play13:52

the function is big enough it will be

play13:55

able to approximate basically everything

play13:57

we know as a Humanity right now

play13:59

and it will be able to retrieve it back

play14:01

and tell you when it has happened or who

play14:04

did the golden retriever uh who created

play14:06

the golden ret breed what is more

play14:09

interesting that size Matters not only

play14:11

for knowledge but so-called intelligence

play14:13

and here is also spoiler a lot from our

play14:15

machine learning team that uh basically

play14:17

the things like intelligence knowledge

play14:21

anything that I'm saying like the AI

play14:23

model is trying to explain you something

play14:24

tells you something or something is not

play14:27

really spe like is not really precise in

play14:30

mathematical sense in mathematical sense

play14:31

we have approximation function that is

play14:33

just generating the next toen so it

play14:35

doesn't know anything it doesn't think

play14:36

anything doesn't try to explain you

play14:38

anything it just generates from the

play14:39

context next toing but we do perceive it

play14:42

as an intelligence and knowledge and

play14:44

with extremely big models what we have

play14:47

seen recent years is that the bigger

play14:49

model you have the more intelligence you

play14:51

have and with gp4 for example it's

play14:54

capable of not only having a knowledge

play14:56

of the Lord TW Mouse but understanding

play14:59

the request so I'm asking it to write a

play15:02

poem about Golden Retrievers it uses its

play15:04

knowledge it retrieves its knowledge it

play15:05

knows that it's playful creatures since

play15:07

that is why open play it's able to

play15:09

generate PS about Golden gold it knows

play15:12

about Lord thread Mouse and it's even

play15:13

able to generate a pound about pun about

play15:17

retrievers being creatures to fetch and

play15:19

carry so that is pretty good poem and

play15:22

the only thing why it's able to create

play15:24

it is that it's extremely large moral

play15:26

that is capable of encoding a lot of

play15:28

knowledge inside of it so why does it

play15:31

matter because if we have not just

play15:34

language model but a very big language

play15:36

model large language model we're capable

play15:39

of generating a lot of more code based

play15:42

on the World Knowledge so I'm asking the

play15:44

model to generate example of language

play15:46

model usage right and since it knows

play15:50

what is language model it's even capable

play15:52

of retrieving some additional context

play15:53

I'll be telling about it from the ID it

play15:56

knows what should be generated since

play15:59

it's have a lot of knowledge it knows

play16:01

that language models for example most of

play16:02

the time have functions that is called

play16:04

predict and it's even capable of

play16:07

generating some example for me and we

play16:09

see that it generates a multi-line

play16:11

suggestion that tells us how to generate

play16:14

how to create the object of type

play16:15

language model that knows what is the

play16:17

function to predict and what is the

play16:19

prefix so pretty

play16:22

cool finally the latest thing to get us

play16:26

to char GPT and other Gemini Amazon Q

play16:31

whatever new AI models that are kind of

play16:35

changing the world around us is the

play16:37

thing called

play16:38

fine-tuning so fine tuning is the

play16:41

process of adapting trained model to new

play16:43

task with additional training on

play16:44

specific

play16:45

examples and I can kind of illustrate it

play16:48

with a pretty simple example I'm as a

play16:50

human being have some language model

play16:53

inside me which is predicting next token

play16:55

and I have the language model that is

play16:57

predicting that my dog is golden for

play17:00

example I may decide that this language

play17:02

model should be adapted to another human

play17:04

being who have case h for some reason do

play17:07

anyone knows what case Hunt is no no

play17:10

it's it's another breit of the dog uh so

play17:14

another human B being has a case horn

play17:17

and I'm trying to adapt language moral

play17:19

to the owner of case horn the problem is

play17:22

that most likely I have a lot of

play17:23

knowledge that is not changing depending

play17:25

on the breed of dog I have but it needs

play17:28

to generate that my do is case hunt that

play17:30

is why I'm showing to this language

play17:32

Model A number of examples like 20,000

play17:34

or so to tell it that my do is case HT

play17:37

case HS are good dogs blah blah blah

play17:40

blah blah that is a fine tuned model

play17:42

this fine tuned model has been biased

play17:44

towards some specific examples while

play17:47

retaining most of its

play17:49

knowledge and now kind of a brain Splash

play17:53

uh that uh chat models that you are

play17:55

seeing nowadays is also language models

play17:58

that are just fine tuned to generate

play18:00

chat for some reason it's not really

play18:02

obvious but chat models like chat GPT

play18:04

gemini or anything are not actually

play18:06

generating I don't know messages or

play18:08

graphs or something that is the same

play18:10

language model that we existed like 20

play18:12

years ago they are still generating toen

play18:14

by toen something and the only

play18:16

difference is that it's a extremely big

play18:18

model that has been trained on a very

play18:19

big data set of the whole internet and

play18:21

most likely everything else including

play18:23

Library of Congress and then they've

play18:25

been fine tuned to talk to you in a chat

play18:29

uh in a chat way they know that there is

play18:31

a so-called syntax that is chat ml in

play18:34

case of opena models and they have been

play18:36

additionally fine tuned to answer to

play18:38

questions in this syntax so they've been

play18:41

kind of a biased to live in the world

play18:44

where we don't have just knowledge

play18:45

written in all the forms but we have a

play18:47

knowledge that is written as a chart and

play18:50

that is why we are just moving from the

play18:52

extremely simple language models that

play18:53

are generating something token by token

play18:55

to a very fun chat UI that responds to

play18:58

you that have a knowledge that

play19:00

understands your instructions and so on

play19:02

and you can actually see a real example

play19:04

of how it's done with chatl I think for

play19:06

Lama two it

play19:08

was another thing that we need to find

play19:11

tune for is instruction so when we

play19:13

talking about language models language

play19:15

models are generating something talking

play19:17

by to they don't understand actually

play19:18

instructions so my doc is golden

play19:20

retriever for if I will ask you to write

play19:23

a poem about Golden Retriever without

play19:25

additional fine tuning model will will

play19:27

likely just continue the pH phrase so

play19:29

write a poem about Golden Retrievers and

play19:31

K horns for the owner of those creatures

play19:35

instead we can additionally find tunit

play19:37

to understand instructions and with it

play19:39

you will have kind of uh we just create

play19:41

additional data set which is extremely

play19:43

precious and actually one of the all the

play19:45

data sets are extremely precious for the

play19:47

companies that are using them to train

play19:49

the models so uh it's not that much

play19:52

actually nowadays about the models a lot

play19:54

more about data set so uh we train it on

play19:57

additional data set that provides you an

play19:59

instruction and then we generate

play20:01

something in response we show it a

play20:03

number of examples that has been created

play20:04

by someone for example by annotators

play20:06

that has been additionally hired to it

play20:08

that if you have been if you get write a

play20:11

poem about Golden Retriever then

play20:12

generate a poem If you have been asked

play20:14

how to cook something generate it if you

play20:17

have been uh asked to how to cook

play20:19

something illegal please don't generate

play20:20

it and that is just instructions

play20:23

so like taking the thing right from the

play20:27

machine learnings we have an original

play20:29

function that is generating something

play20:31

token by token we approximate it into

play20:33

train function that is also generating

play20:35

something token by token we then Find

play20:37

Unity to understand chart ml markup and

play20:39

instructions and you get chart GPT if

play20:41

you have enough money and enough G GPU

play20:44

power you will get Char GPT faster

play20:47

pretty

play20:47

easy uh one important note actually that

play20:50

for some for quite some time

play20:52

unfortunately uh machine learning

play20:54

Engineers have been trying to find some

play20:56

kind of a more intelligent way to uh

play20:58

gener to create more intelligent models

play21:01

and they've been trying different

play21:03

architectures well uh it seems money

play21:06

solves the problem so you just need to

play21:07

invest more and you'll get more

play21:09

intelligent models so uh PT a moving on

play21:13

to development assistants now we know

play21:15

how to generate uh pretty good models

play21:19

now we know what is chat uh models how

play21:21

do they work and we can actually move on

play21:24

to creation of development assistant and

play21:27

development assisting models

play21:29

and the first and the most easy model to

play21:31

create is as I said dedicated language

play21:33

model to generate based on prefix so we

play21:35

have a prefix which is fun main WM

play21:38

language while prefix blah blah blah and

play21:40

we just ask it to generate uh postfix

play21:44

that is good enough model unfortunately

play21:45

it has some pitfalls for example you may

play21:48

be generating something in between your

play21:51

function so you have something up like

play21:54

VM V prefix you have something down V

play21:56

generation aticle expected and as a

play22:00

human being you will be taking a look at

play22:01

it and understand that likely we need to

play22:06

uh assign here while expected but if we

play22:08

are training a large language language

play22:11

model that is based only on prefix it

play22:14

will generate you something and it will

play22:16

not know that there is expected

play22:18

somewhere near somewhere above that she

play22:21

uh that the language model need to take

play22:22

into account so here comes fine tuning

play22:26

we just take prefixes and suffixes it's

play22:28

called f in the middle technique and we

play22:31

do fine tune the model that initially

play22:33

has been trained on for example cot data

play22:35

set we do fine tune model take into

play22:36

account that there is a prefix there is

play22:38

a suffix and now you should continue

play22:40

with the middle it basically changes the

play22:44

understanding of the model of the world

play22:46

around it in at all so basically now you

play22:49

are not working with the text that you

play22:51

do need to continue you are working with

play22:52

some very strange text that it has

play22:55

something called prefix then has

play22:56

something called suffix and then you

play22:58

need to continue middle but it makes the

play23:01

morel take into account what is

play23:03

happening after the middle and it will

play23:06

generate better completions for you so

play23:08

fill in the middle technique is very

play23:09

important and with it we can actually

play23:11

create the very first uh applic very

play23:14

first implementation of J brain CI we

play23:16

have main KT file cotland file we just

play23:21

deploy some cotland language model or we

play23:23

deploy it to local machine if we are

play23:26

having not that big model and and then

play23:28

send prefix suffix return to you the

play23:31

generation and everyone is happy and I

play23:33

think Capal did it like 5 years ago or

play23:36

so but that is not that interesting

play23:39

right so we have prefix and suffix it's

play23:42

not really clear how it should take into

play23:43

account the fact that I'm writing with

play23:45

Scotland 2.0 which is said to have no

play23:48

new features but likely we'll have them

play23:51

uh it doesn't said uh say to us how to

play23:54

take into account for example that I'm

play23:55

writing in cater it all should be

play23:57

inferred from one file so we are just

play24:01

moving on to the bigger model and we

play24:03

make it also generate on prefix or on

play24:05

prefix and post fix but here we may

play24:08

expect at least that the model will be

play24:10

taken into account for example comments

play24:12

because it's big enough model it will

play24:14

understand natural language and will

play24:16

generate something based on it with it

play24:18

we already have a bit harder

play24:20

infrastructure we have hosted language

play24:22

model uh cot language model we have

play24:24

hosted large language model we can

play24:25

integrate with different third party

play24:27

large language model provid us

play24:29

everything is working everything is cool

play24:31

but here is the question what to do next

play24:35

As We Know large language models are

play24:37

instructable and they are knowledgeable

play24:39

so we can actually use this we can

play24:42

provide a lot more to the context and

play24:45

expect the model to understand it so the

play24:47

easiest solution pretty obvious and

play24:50

straight to the point is just to add

play24:51

more comments so we as an IDE collect

play24:54

automatically additional context like

play24:56

what is the file name what is the

play24:57

language version

play24:58

which are the other files that may be

play25:00

taken into account what are the I don't

play25:03

know additional what are the libraries

play25:05

here and so on and we do expect language

play25:08

model to take it into account if we are

play25:10

taking a very big lar language model

play25:11

like GPT 4.0 I think it's GPT 40 or

play25:15

gemini or something it has been trained

play25:17

on a lot of data it understands natural

play25:19

language it it is instructable it's

play25:21

knowledgeable it will just understand it

play25:24

so we just kind of pushing something it

play25:26

and hope that it will work and it's

play25:27

actually works working so with it we are

play25:31

introducing a lot more context with

play25:33

extremely obvious way just adding

play25:35

something to comments and it will

play25:37

already understand it it will already

play25:39

generate better code and it will take

play25:41

into account all the things that we

play25:42

provided with fortunately context

play25:45

doesn't end up with uh just related

play25:48

files or on we have a lot of things

play25:50

inside the ID that we can use to provide

play25:52

additional context for example we can

play25:54

use your behavior we can decide that for

play25:57

example you've been taken look into the

play25:59

recent file for I don't know 5 minutes

play26:01

and just switch to another one and we

play26:03

should add it to the context because

play26:05

likely somehow it's related we can take

play26:08

a look at the files that you are writing

play26:09

right now but we can also take a look at

play26:11

the related files in different ways

play26:13

based on different distances we can take

play26:15

a look at the request that you are using

play26:17

you may for example not generate

play26:19

something as a quote completion model

play26:21

but you can provide us with additional

play26:24

uh specific feedback and in Fleet it's

play26:25

done with common dot you just press

play26:27

common dot there is is an input field

play26:29

you are writing please generate that

play26:31

class and we know that you have a very

play26:33

specific request we can take a look at

play26:35

your project structure language

play26:37

libraries and so on we have a lot of

play26:38

code inside so we can take all of it and

play26:41

generate additional comments or somehow

play26:43

provide LGE language model with it but

play26:46

there is a problem with it

play26:49

unfortunately that we need somehow to

play26:51

trim this context a lot of you may have

play26:54

heard about the kind of a context

play26:55

problem uh fortunately quite recently is

play26:58

started to be being solved there is a

play27:00

Gemini models that are released with

play27:03

context window of few million tokens so

play27:05

you can upload pretty middle siiz

play27:08

projects there but the problem is that

play27:09

you can actually you need also actually

play27:11

to pay for it and uh we need to improve

play27:15

context we need to trim it we are not

play27:17

able to send for example 32k of tokens

play27:20

to the model and just expect that

play27:22

everything will be working we will be

play27:23

paying too much we can use on device

play27:25

machine learning models and here we are

play27:27

getting back to gold retrievers uh

play27:29

unexpectedly to understand which files

play27:31

are actually related to the current file

play27:34

and which are not here is extremely uh

play27:36

simple idea uh we have a Docker KT which

play27:40

you are editing right now and we have

play27:41

docs processes and client KT uh for the

play27:45

model that is based on Bings or

play27:47

something else it's pretty easy to

play27:48

understand that likely Dock and process

play27:50

KT is somehow related because it has

play27:52

been initially trained and it knows that

play27:54

do c groups everything it may understand

play27:57

that do and client KT is somehow related

play28:00

but it will probably decide that Docker

play28:02

and docks are not really related while

play28:04

it can be actually wrong because it may

play28:06

be watch dos still we can use on device

play28:09

machine learning models to automatically

play28:11

trim the context and provide B better

play28:14

context to the machine learning model

play28:16

and with it we get to the pretty simple

play28:18

architecture scheme where we have uh on

play28:20

device models that are running to

play28:22

automatically rank the files that you

play28:24

have right and provide them into context

play28:26

we have additional context collect that

play28:28

takes into account what is the project

play28:31

what is the library what version of cotl

play28:33

you are using it takes all of it

play28:35

composes a request sends this to API and

play28:38

then we decide whether it should be uh

play28:40

implemented with Scotland language model

play28:42

or hosted language models and that is

play28:45

actually the way it works in Fleet here

play28:47

is an

play28:48

example you can actually try it right

play28:50

now in the latest version ofle Fleet you

play28:52

can just start free trial for J brain C

play28:55

and it will be working and that is a big

play28:57

model that that sakov has announced

play29:00

during the cotlin U cotlin con keyote

play29:04

that is a big model 3.7 billion

play29:06

parameters that is initially just

play29:09

multilanguage model then it has been

play29:10

fine tuned to specifically cotlin and

play29:12

then it has been fine tuned to very

play29:14

specific context so we can for example

play29:17

we can actually drop the size of the

play29:20

model if we are fine-tuning it to a very

play29:22

specific way we are providing the

play29:24

context to a very specific way we are

play29:26

providing the files it will not have to

play29:28

be that instructable and knowledgeable

play29:30

it will just know that this is the way I

play29:32

see the context and I've been trained

play29:34

initially on cotlin works pretty good to

play29:36

try it out in C in Fleet there is still

play29:40

a problem even when we can collect the

play29:43

context test it's called Extreme context

play29:45

sizes in a lot of cases if you are

play29:48

writing something for example in intelly

play29:50

plot from manaa where I will sometimes

play29:52

I'm writing something there are a lot of

play29:55

related files and we need some how to

play29:58

understand it we can do it with on

play30:00

device models but how do they work here

play30:02

you may

play30:03

see or I think may not see because of

play30:06

contrast uh but the max actually is a

play30:08

little bit brighter uh why there is a

play30:12

thing called embedding embedding is the

play30:14

vector representation of text that

play30:16

typically captures the semantic

play30:17

information what does it mean so we just

play30:19

take the text get some Vector of in and

play30:22

uh two different vectors will be similar

play30:25

in a cin similarity or other similarity

play30:27

if they are actually semantically very

play30:29

similar pretty neat thing and with it

play30:33

now you should see it uh we're able here

play30:35

for example to understand that if we see

play30:37

Val Max uh most likely the context about

play30:41

Max is more or less related to the

play30:44

things that I'm writing right

play30:46

now we can use it to automatically trim

play30:49

context context for example like we do

play30:52

it for Fleet or we can do it even for

play30:54

more cooler things that hopefully will

play30:56

be released soon in

play30:58

braci uh we can understand similarity

play31:01

between the code and natural language

play31:04

for example you're asking how to teiz

play31:05

the text and we are checking all the

play31:08

diffs uh all the depths of functions

play31:11

that you have for python we get the

play31:13

embeddings we automatically check which

play31:15

are most similar and tell you that it

play31:17

seems like split is the way to teiz the

play31:20

text and tick toing most likely is also

play31:22

ways toize the text and edings is also

play31:25

extremely powerful concept it's already

play31:28

used for you to provide uh in Fleet in a

play31:31

assistant of Fleet uh the knowledge from

play31:33

the cotland documentation so we

play31:35

basically just index the whole cotlin

play31:37

documentation that exists you are asking

play31:39

the question we decide whether this is a

play31:41

question about cotlin or Fleet

play31:42

documentation or it's about something in

play31:45

general and if it is a question about

play31:47

cotl documentation we automatically find

play31:49

some responses in cotl documentation and

play31:51

map them that is called grounding

play31:54

technique and what is important about it

play31:56

is that if we are even using pretty old

play31:58

model like I don't know GPT 4 that has

play32:00

been released year ago not GPT 4.0 we

play32:04

are able to provide the model with the

play32:06

context and data that is relevant right

play32:09

now so like cotlin has been released

play32:11

yesterday cotlin 2.0 we do not expect

play32:13

any lar large language model to know

play32:15

about it because the training of the

play32:17

large language model right now takes at

play32:19

least three months so a lot of open AI

play32:22

models will likely tell you about

play32:24

cotland do zero like in September or

play32:26

August where there will be new versions

play32:28

of cotlin and that is why it's important

play32:30

we need somehow to provide model with

play32:33

the relevant and uh accurate information

play32:36

and we can do it with embedding so we

play32:38

index a lot of documentation we

play32:40

automatically find the most relevant

play32:41

pieces provided to to the model and the

play32:43

model is capable of answering you a

play32:45

question about what is the latest

play32:47

release of cotlin what were the latest

play32:48

features of cotlin and so on it can be

play32:50

done actually in different ways uh I

play32:52

think year ago Studio B has been

play32:54

announcing that they have so-called

play32:56

trusted aners and as far as I remember

play32:58

it also has been done by edings but

play33:00

there were a very specific list of kind

play33:02

of answers and questions we are just

play33:04

indexing the whole documentation and

play33:06

mapping it but it's pretty cool I think

play33:10

and with it we are moving to the

play33:12

probably last and most important

play33:14

question of this

play33:16

presentation whether you actually need

play33:18

large language model at all so we've

play33:20

been talking about like machine learning

play33:22

algorithms we've been talking about

play33:23

language models this the moment where

play33:25

you are just investing more money in

play33:27

language moral and get bigger and better

play33:30

lar language language model but the

play33:33

problem is that while language lar

play33:35

language models are very knowledgeable

play33:37

very in very under instructable they

play33:40

also extremely expensive because it's

play33:43

very big model that knows everything

play33:45

that is capable of doing everything it's

play33:47

extremely easy nowadays to create some e

play33:49

features you don't have to understand

play33:50

how by torch work but at the same time

play33:53

it's extremely expensive to support it

play33:55

on a long run so something features

play33:58

still can be implemented with a smaller

play33:59

models with a fine tuning and we'll talk

play34:01

about it a bit and here I will introduce

play34:05

the latest the last the last and the

play34:07

most probably important concept uh

play34:09

during this talk that is called

play34:10

inference so what is inference we had a

play34:14

original function we have a train

play34:16

function trained function is

play34:17

approximation if we have an extremely

play34:19

big language model large language model

play34:21

it takes some time to actually execute

play34:23

this language model upon the input that

play34:26

you have and it takes takes some money

play34:29

it takes a lot of money and the whole

play34:31

thing right nowadays about AI is not

play34:34

about actually training very big models

play34:37

so gp4 is already good enough and a lot

play34:41

of folks are actually right now trying

play34:43

to distill and make the model smaller

play34:46

while pertaining retaining their uh

play34:48

qualities and accuracy because inference

play34:51

nowadays is the primary driver for

play34:54

primary cost driver for EI Services you

play34:57

have to kind of put a lot of money into

play34:59

training that is true but you have to

play35:01

put a lot more money into inference

play35:03

because openi or any other L provider is

play35:06

actually answering to questions of

play35:08

hundreds of millions people each day

play35:10

they have hundred millions of responses

play35:11

and it takes a lot more GPU power to

play35:14

just support it and that is why nowadays

play35:17

a lot of a companies are actually about

play35:19

cheaper inference not about uh better

play35:22

models you may see if you are taking

play35:24

look at the AI scene right now that

play35:26

there are a lot of AI startup that are

play35:29

launching the most efficient ways of

play35:31

inference for models not the most

play35:34

accurate and most precise models because

play35:36

this is actually the competitiveness

play35:37

advantage and that is the most important

play35:40

and primary cost driver for the AI

play35:44

itself and what is important to know if

play35:46

you actually willing to create some AI

play35:49

feature uh you are willing to create

play35:51

some AI powered application it's almost

play35:54

impossible to beat LM providers in cost

play35:56

per token for General models that is

play35:59

pretty much similar thing as with Cloud

play36:01

providers nowadays because AI providers

play36:04

like open AI Google anthropic Amazon any

play36:07

other folks they are spending their

play36:09

money out of their pocket each day to

play36:12

run the model they have they are

play36:14

investing enormous money to try spend to

play36:16

try to spend less money per day per user

play36:20

if you are willing to train something

play36:22

you will spend a lot of money you will

play36:24

spend a lot more money to infer it so if

play36:27

you actually need gp4 don't even try to

play36:30

infer it to run it somewhere I don't

play36:32

know in AWS it definitely will be a lot

play36:36

more expensive for you than for open AA

play36:39

but if you need General model you can

play36:40

just take one of the existing providers

play36:42

actually uh this slide was in the

play36:44

previous presentation and after it I've

play36:45

already uh two times rewrite Rewritten

play36:47

it it was I think four months or so um

play36:50

so we have open do anyone know what open

play36:54

is yeah okay do anyone know what

play36:58

is okay so yeah an Tropic is not that

play37:02

popular uh so there is an open ey the

play37:05

biggest L provider probably not

play37:07

Financial wise but like the most known

play37:09

one with open a CH GPT with open a GPT

play37:13

40 with the different scandals about

play37:15

scar Johansson so uh there are other LM

play37:19

providers that are not that scandalous

play37:21

like an Tropic for example uh which also

play37:24

have the clo model which is pretty good

play37:27

there is Google gini model there is

play37:29

European provider M which is a French

play37:32

based provider for large language model

play37:34

there are a lot of folks and if you

play37:37

actually need a general model please go

play37:39

to one of the lamp providers and just

play37:41

run it there you will spend a lot less

play37:43

money and you will get a lot better

play37:45

results than trying to run I don't know

play37:47

llama uh somewhere in AWS and hoping

play37:50

that it would be good enough but there

play37:52

is another case if you have actually

play37:54

Engineers if you have a very specific uh

play37:56

task that will require from you some

play37:59

machine learning and if you have some

play38:01

specific reasons you may actually invest

play38:03

a lot of money into machine learning

play38:04

Department uh in me and uh try to create

play38:08

something for example uh you can do bug

play38:11

detection with uh machine learning it's

play38:14

pretty important pretty known technique

play38:17

but the problem is that you need to do

play38:19

bug detection for all in case of J

play38:21

brains for all ID users each time during

play38:25

each typing the easiest way to do it

play38:27

with local model because in that case

play38:30

users are paying for it themselves it

play38:32

doesn't uh move anywhere from their

play38:34

laptop and the same time we don't have

play38:36

any problems with enormous Cloud fleets

play38:39

trying to uh process your requests and

play38:41

the problem is L with lar language model

play38:43

that there will be such more such a lot

play38:45

of requests that it will cost enormous

play38:48

amount and you can actually invest a lot

play38:50

of money in machine learning and get

play38:52

pretty decent results with a lot smaller

play38:54

model that will be running a lot faster

play38:57

another example that I've been talking

play38:59

about is the on line code completion and

play39:01

actually multi-line code completion if

play39:03

you are fearless enough because it also

play39:06

will cost you a lot and you don't

play39:08

actually need a large language model

play39:09

here like you don't need 300 or 900

play39:13

billion parameter model to understand

play39:15

that here most likely you need V

play39:17

generation you can as we talked

play39:20

previously you can actually just train

play39:22

the model you can take the data set and

play39:24

it will work good enough and what is

play39:27

even more interesting that in a lot of

play39:29

cases it will work as good as gp4

play39:32

because you don't have a lot of context

play39:34

here you are just kind of predict in the

play39:35

next talking gp4 doesn't know a lot more

play39:38

than a on line prediction model and it

play39:40

will not produce better

play39:41

results and with

play39:43

it we are getting to the final look of

play39:47

how J brain CI in Fleet and other things

play39:50

are working right now so we have this

play39:53

main. KT which is just file we have on

play39:57

device model we even have nowadays one

play39:59

line code completion model intelly which

play40:01

is called Full Line code completion that

play40:03

is running fully locally we have context

play40:05

collectors we are using on device mail

play40:07

with all of it we are composing some

play40:09

context that are being sent to API in

play40:11

cloud and there we have completion

play40:14

models even big language models that are

play40:16

answering you some questions or helping

play40:19

you and we have API to S party LM

play40:22

providers because as I said it's

play40:23

impossible nowadays to beat quality and

play40:27

the uh performance effectiveness of open

play40:29

a or any other LM providers they are a

play40:32

lot more in a game than we and they

play40:34

spend a lot of more money each day to

play40:37

each of the users that is why they are a

play40:39

lot more interested in optimization of

play40:41

the inference so this is the way it

play40:43

works you can actually nowadays try it

play40:45

in Fleet like right now and maybe I have

play40:49

some questions thank you

play40:51

[Applause]

play41:00

yeah uh one

play41:02

moment they're running with the

play41:12

microphone so I have a question

play41:14

regarding context collection so you had

play41:16

an example with Docker KT process KT and

play41:19

you mentioned bypassing that oh well is

play41:22

going to know that dock KT and process

play41:24

KT are related because it has some

play41:25

general knowledge about cgroups and blah

play41:27

blah blah but isn't it so that the

play41:30

context collection that you're building

play41:32

right now will use the call graph of

play41:34

your applications code to collect

play41:37

context actually we have a pretty

play41:39

interesting architecture for example in

play41:42

R sharper they are using call graph and

play41:45

then they are using also embeddings to

play41:47

assign weights to the call graph and

play41:50

understand what can be TR because you

play41:51

still have kind of a limitation in

play41:53

general like 32k or 8K or 4K of tokens

play41:56

so you are using call graph and then you

play41:58

assign weights and trim the graph to fit

play42:01

in okay but you still will be using the

play42:05

call graph yeah yeah of course already

play42:07

today yeah it's already today used okay

play42:11

thanks thank you for the question any

play42:14

other questions we have like 2 minutes

play42:17

50

play42:21

seconds okay so if you don't have any

play42:23

other questions uh thank you and you can

play42:27

also ask them in person by the stage I

play42:30

guess thank you

play42:35

[Music]

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Artificial IntelligenceKotlin DevelopmentMachine LearningCode EditorNatural LanguageContextual AIModel TrainingCode CompletionAI FunctionalityDeveloper Tools