LangChain Explained in 13 Minutes | QuickStart Tutorial for Beginners

Rabbitmetrics
13 Apr 202312:44

Summary

TLDRLangChain是一个开源框架,它允许开发者将大型语言模型如GPT-4与外部计算和数据源结合使用。这个框架特别在2023年3月GPT-4发布后变得流行。LangChain通过将文档分割成小块并存储在向量数据库中,使得语言模型能够引用整个数据库中的数据。这样,用户可以构建既能够引用个人数据又能执行操作(如发送电子邮件)的应用程序。LangChain的核心概念包括LLM包装器、提示模板、索引、链和代理。它支持开发者通过编写代码来构建应用程序,这些应用程序能够动态地根据用户输入变化提示,并执行复杂的任务,如解析文本、执行相似性搜索和与外部API交互。

Takeaways

  • 📚 **链 (LangChain) 是一个开源框架,允许开发者将大型语言模型(如 GPT-4)与外部计算和数据源结合使用。
  • 🚀 链的流行度在 GPT-4 于 2023 年 3 月发布后迅速上升,特别是因为它能够处理大量数据和执行复杂任务。
  • 🔗 链允许将大型语言模型连接到个人数据源,如书籍、PDF 文件、含有专有信息的数据库等。
  • 📈 链通过将文档分割成小块并存储在向量数据库中,以嵌入(向量表示)的形式,帮助构建能够执行通用流程的语言模型应用程序。
  • 🤖 链帮助构建的数据感知且能够执行操作的应用程序,可以用于个人助理、学习、编码、数据分析等多个领域。
  • 🔑 链的主要价值主张可以概括为三个概念:LLM 包装器、提示模板、索引、链和代理。
  • 🛠️ LLM 包装器允许连接到大型语言模型,提示模板避免了硬编码文本,索引用于提取信息,链用于组合多个组件解决特定任务。
  • 🧲 链使用 Pinecone 作为向量存储,可以将文本分割成小块,并通过 OpenAI 的嵌入模型 Ada 将文本转换为向量表示。
  • 🔍 在 Pinecone 中存储向量后,可以进行相似性搜索以获取问题的答案或提取所有相关块。
  • 📝 链还允许使用 Python 代码解释器,使语言模型能够运行 Python 代码,执行如求解二次函数根等计算任务。
  • 🌐 链框架不断更新,提供了模型或模型包装器、问题、链、嵌入和向量存储以及代理等元素的高层次概览。
  • 📈 链的应用程序可以连接到公司数据,如客户数据、市场营销数据等,预示着数据分析和数据科学领域将取得指数级进步。

Q & A

  • 什么是LangChain?

    -LangChain是一个开源框架,允许开发者将大型语言模型(如GPT-4)与外部计算和数据源结合起来使用。它提供了Python和JavaScript(具体是TypeScript)的包。

  • LangChain为什么特别有用?

    -LangChain允许将大型语言模型连接到个人数据源,如数据库,并且能够执行基于检索信息的操作,如发送电子邮件,这使得它在数据感知和执行操作方面非常有用。

  • LangChain如何与GPT-4等大型语言模型配合使用?

    -通过将数据分割成小块并存储为向量(即嵌入),LangChain可以构建一个向量数据库,使用户能够通过相似性搜索检索相关信息,并将这些信息提供给语言模型以生成答案或执行操作。

  • LangChain的主要价值主张是什么?

    -LangChain的主要价值主张可以概括为三个概念:LLM包装器(连接大型语言模型)、提示模板(避免硬编码文本)、索引(提取LLM的相关信息)、链(组合多个组件以解决特定任务)、代理(允许LLM与外部API交互)。

  • 如何使用LangChain构建应用程序?

    -使用LangChain构建应用程序通常遵循一个管道:用户提出问题,问题发送到语言模型,然后使用问题的向量表示在向量数据库中进行相似性搜索,检索相关信息并将其提供给语言模型,从而生成答案或执行操作。

  • LangChain中的“链”是什么?

    -在LangChain中,链是将语言模型和提示模板结合起来的接口,它接受用户的输入并从语言模型输出答案,类似于一个复合函数,其中内部函数是提示模板,外部函数是语言模型。

  • 如何使用LangChain处理和存储文本数据?

    -首先,使用文本分割器将文本分割成小块,然后使用OpenAI的嵌入模型(如Ada)将文本转换为向量表示,并将这些向量存储在Pinecone这样的向量存储库中。

  • LangChain中的代理有什么作用?

    -LangChain中的代理允许大型语言模型与外部API交互,执行如运行Python代码、查找二次函数的根等操作,这增加了应用程序的功能和灵活性。

  • LangChain如何帮助数据科学和数据分析?

    -LangChain能够连接大型语言模型到现有的公司数据,如客户数据、市场数据等,这有助于在数据科学和数据分析领域实现指数级的进步。

  • LangChain的提示模板是如何工作的?

    -提示模板允许将用户输入注入到一段文本中,然后格式化提示并将其提供给语言模型,使得提示可以动态地根据用户输入变化。

  • LangChain是否支持与第三方API的集成?

    -是的,LangChain支持与第三方API的集成,例如通过代理功能允许语言模型执行调用外部服务的操作,如使用OpenAI的Python代码解释器。

Outlines

00:00

🚀 链式框架简介:AI与大型语言模型的结合

第一段主要介绍了Lang chain框架,这是一个开源的框架,允许开发者将大型语言模型如GPT-4与外部计算和数据源结合。框架支持Python和JavaScript(具体为TypeScript)。视频将深入探讨Python框架,解释其为何在GPT-4于2023年3月发布后变得流行。通过一个实际例子,说明了如何将大型语言模型连接到个人数据源,如书籍、PDF文件或包含专有信息的数据库。Lang chain通过将文档分割成小块并存储在向量数据库中,使得语言模型能够访问整个数据库的信息,而不仅仅是文本片段。此外,Lang chain还能帮助执行所需操作,如发送包含特定信息的电子邮件。

05:00

📚 Lang chain核心概念与应用实例

第二段深入探讨了Lang chain的核心概念,包括大型语言模型(LLM)封装器、提示模板、索引、链和代理。LLM封装器允许连接到GPT-4等大型语言模型。提示模板避免了硬编码文本,使得可以动态地将用户输入注入到文本中。索引用于从语言模型中提取相关信息。链能够将多个组件组合在一起,解决特定任务并构建完整的LLM应用程序。代理允许语言模型与外部API进行交互。视频还演示了如何使用Lang chain构建一个能够解释机器学习概念并将其简化为五岁儿童能理解的语言的链。此外,还介绍了如何将文本分割成块并存储为向量表示,以便在向量数据库中进行搜索和检索。

10:02

🧮 向量存储与代理:Lang chain的高级应用

第三段讨论了向量存储的概念,特别是如何使用Lang chain和Pinecone将文本块转换为向量表示并存储。这涉及到使用OpenAI的嵌入模型Ada来获取文本的向量表示。然后,这些向量被存储在Pinecone中,允许进行相似性搜索以检索相关信息。最后,视频简要介绍了Lang chain中的代理概念,展示了如何使用OpenAI语言模型运行Python代码,例如找到二次函数的根。视频以鼓励订阅和感谢观看结束,旨在为观众提供Lang chain核心概念的简要介绍。

Mindmap

Keywords

💡LangChain

LangChain 是一个开源框架,它允许开发者将大型语言模型(如 GPT-4)与外部计算和数据源结合起来。这个框架是视频讨论的核心,因为它提供了一种方法来增强语言模型的能力,使其能够访问和处理个人或公司专有数据。在视频中,LangChain 被用来展示如何将语言模型与数据库连接,并通过向量数据库检索信息,从而提供更精准的答案或执行特定操作。

💡大型语言模型(LLM)

大型语言模型(LLM)是指具有大量参数和能够处理复杂语言任务的人工智能模型,如 GPT-4。在视频中,LLM 被用来说明 LangChain 如何与这些模型交互,以及如何利用它们的能力来处理和回答基于用户个人数据的问题。

💡向量数据库

向量数据库是一种存储数据的方式,其中数据以向量形式(即文本的数值表示)存在。在视频中,向量数据库用于存储文本块的向量表示,允许 LangChain 执行相似性搜索,从而检索与用户查询相关的信息。Pinecone 作为向量存储的例子,在视频中被用来展示如何将文本块转换为向量并存储。

💡嵌入(Embeddings)

嵌入是将文本转换为数值向量的过程,这些向量能够捕捉文本的语义信息。在视频中,嵌入用于将用户的数据转换为向量形式,以便在向量数据库中存储和检索。OpenAI 的 Ada 模型被用作嵌入模型,将文本转换为可以存储在 Pinecone 中的向量。

💡Prompt Templates

Prompt Templates 是一种模板,它允许开发者动态地将用户输入插入到文本中,然后格式化 Prompt 并将其提供给语言模型。在视频中,Prompt Templates 被用来展示如何根据不同的用户输入定制化 Prompt,从而使得语言模型能够更灵活地处理不同的查询。

💡链(Chains)

链是 LangChain 中的一个概念,它将语言模型和 Prompt Templates 结合起来,创建一个接口,该接口可以接受用户输入并从语言模型中输出答案。在视频中,链被用来构建一个流程,用户的问题首先被发送到语言模型,然后使用问题向量的表示来进行向量数据库中的相似性搜索。

💡APIs

API(应用程序编程接口)是一种允许不同软件之间交互的协议。在视频中,APIs 被用来说明 LangChain 如何与外部系统(如 OpenAI 或 Google 的服务)交互,以执行更高级的任务,如发送电子邮件或访问公司数据。

💡数据科学

数据科学是一门涉及数据收集、分析、解释和处理的学科,以提取有用信息和洞见。在视频中,数据科学被提及,展示了 LangChain 如何通过连接大型语言模型来增强数据科学和数据分析的能力,尤其是在处理和解释大量数据集方面。

💡个人助理

个人助理通常指的是能够执行日常任务并提供个性化服务的系统或应用。在视频中,个人助理被用来说明 LangChain 和大型语言模型如何被用来创建能够处理个人任务(如预订航班、转账或纳税)的智能系统。

💡学习

学习是获取新知识或技能的过程。在视频中,学习被强调为 LangChain 一个潜在的应用领域,其中大型语言模型可以被用来参考整个教学大纲,并帮助用户以尽可能快的速度学习材料。

💡Python 代理

Python 代理是 LangChain 中的一个组件,它允许语言模型执行 Python 代码。在视频中,通过创建一个 Python 代理并使用它来找到二次函数的根,展示了 LangChain 如何将语言模型的能力扩展到执行编程任务。

Highlights

LangChain 是一个开源框架,允许开发者结合 AI 大型语言模型如 GPT-4 与外部计算和数据源。

LangChain 以 Python 或 JavaScript 包的形式提供,特别是 TypeScript。

GPT-4 的引入使得 LangChain 框架的受欢迎程度在 2023 年 3 月后急剧上升。

LangChain 允许将大型语言模型连接到个人数据源,如书籍、PDF 文件或包含专有信息的数据库。

通过将文档分割成小块并存储在向量数据库中,LangChain 可以构建遵循通用流程的语言模型应用程序。

LangChain 帮助构建的数据感知应用程序不仅可以引用我们自己的数据,还可以采取行动,而不仅仅是回答问题。

LangChain 的能力开启了无数实际用例,包括个人助理、学习新事物、编码、数据分析和数据科学。

LangChain 能够将大型语言模型连接到现有公司数据,如客户数据和市场营销数据,预示着数据分析和数据科学将取得指数级进步。

LangChain 的主要价值主张可以概括为三个概念:LLM 包装器、提示模板、索引、链和代理。

LLM 包装器允许连接到像 GPT-4 这样的大型语言模型,提示模板避免了硬编码文本的需要。

索引用于从 LLM 中提取相关信息,链允许组合多个组件来解决特定任务并构建整个 LLM 应用程序。

代理允许 LLM 与外部 API 交互,例如执行 Python 代码或与 OpenAI 的语言模型交互。

LangChain 通过编写代码来逐步解包每个元素,首先安装必要的 Python 库和环境文件。

LangChain 使用 Pinecone 作为向量存储,展示了如何将文本分割成小块并存储为向量表示。

LangChain 的文本分割器工具允许将文本分割成适合存储在向量存储中的小块。

通过 OpenAI 的嵌入模型 Ada,可以将文本转换为向量表示,进而存储在 Pinecone 中。

LangChain 允许进行相似性搜索以从向量数据库中提取相关信息。

LangChain 的 Python 代理执行器可以运行 Python 代码,扩展了语言模型的功能。

视频旨在简要介绍 LangChain 的核心概念,并提供深入理解的起点。

Transcripts

play00:00

blank chain what is it why should you

play00:03

use it and how does it work let's have a

play00:05

look

play00:07

Lang chain is an open source framework

play00:09

that allows developers working with AI

play00:11

to combine large language models like

play00:14

gbt4 with external sources of

play00:17

computation and data the framework is

play00:20

currently offered as a python or a

play00:22

JavaScript package typescript to be

play00:24

specific in this video we're going to

play00:26

start unpacking the python framework and

play00:29

we're going to see why the popularity of

play00:31

the framework is exploding right now

play00:32

especially after the introduction of

play00:34

gpt4 in March 2023 to understand what

play00:38

need Lang chain fills let's have a look

play00:40

at a practical example so by now we all

play00:43

know that chat typically or tpt4 has an

play00:45

impressive general knowledge we can ask

play00:47

it about almost anything and we'll get a

play00:50

pretty good answer

play00:51

suppose you want to know something

play00:53

specifically from your own data your own

play00:56

document it could be a book a PDF file a

play00:59

database with proprietary information

play01:02

link chain allows you to connect a large

play01:04

language model like dbt4 to your own

play01:07

sources of data and we're not talking

play01:10

about pasting a snippet of a text

play01:13

document into the chativity prompt we're

play01:15

talking about referencing an entire

play01:17

database filled with your own data

play01:19

and not only that once you get the

play01:21

information you need you can have Lang

play01:23

chain help you take the action you want

play01:26

to take for instance send an email with

play01:28

some specific information

play01:30

and the way you do that is by taking the

play01:32

document you want your language model to

play01:34

reference and then you slice it up into

play01:36

smaller chunks and you store those

play01:38

chunks in a Victor database the chunks

play01:41

are stored as embeddings meaning they

play01:43

are vector representations of the text

play01:48

this allows you to build language model

play01:50

applications that follow a general

play01:53

pipeline a user asks an initial question

play01:57

this question is then sent to the

play01:59

language model and a vector

play02:01

representation of that question is used

play02:04

to do a similarity search in the vector

play02:06

database this allows us to fetch the

play02:09

relevant chunks of information from the

play02:11

vector database and feed that to the

play02:13

language model as well

play02:15

now the language model has both the

play02:17

initial question and the relevant

play02:19

information from the vector database and

play02:21

is therefore capable of providing an

play02:24

answer or take an action

play02:26

a link chain helps build applications

play02:28

that follow a pipeline like this and

play02:30

these applications are both data aware

play02:33

we can reference our own data in a

play02:35

vector store and they are authentic they

play02:38

can take actions and not only provide

play02:40

answers to questions

play02:42

and these two capabilities open up for

play02:44

an infinite number of practical use

play02:46

cases anything involving personal

play02:49

assistance will be huge you can have a

play02:51

large language model book flights

play02:53

transfer money pay taxes now imagine the

play02:57

implications for studying and learning

play02:58

new things you can have a large language

play03:00

model reference an entire syllabus and

play03:03

help you learn the material as fast as

play03:05

possible coding data analysis data

play03:07

science is all going to be affected by

play03:09

this

play03:10

one of the applications that I'm most

play03:11

excited about is the ability to connect

play03:14

large language models to existing

play03:17

company data such as customer data

play03:19

marketing data and so on

play03:21

I think we're going to see an

play03:22

exponential progress in data analytics

play03:24

and data science our ability to connect

play03:27

the large language models to Advanced

play03:29

apis such as metas API or Google's API

play03:32

is really gonna gonna make things take

play03:35

off

play03:38

so the main value proposition of Lang

play03:40

chain can be divided into three main

play03:42

Concepts

play03:44

we have the llm wrappers that allows us

play03:46

to connect to large language models like

play03:49

gbt4 or the ones from hugging face

play03:52

prompt templates allows us to avoid

play03:55

having to hard code text which is the

play03:58

input to the llms

play04:00

then we have indexes that allows us to

play04:02

extract relevant information for the

play04:04

llms the chains allows us to combine

play04:08

multiple components together to solve a

play04:11

specific task and build an entire llm

play04:13

application

play04:14

and finally we have the agents that

play04:17

allow the llm to interact with external

play04:19

apis

play04:22

there's a lot to unpack in Lang chain

play04:24

and new stuff is being added every day

play04:26

but on a high level this is what the

play04:28

framework looks like we have models or

play04:30

wrappers around models we have problems

play04:33

we have chains we have the embeddings

play04:34

and Vector stores which are the indexes

play04:36

and then we have the agents so what I'm

play04:39

going to do now is I'm going to start

play04:40

unpacking each of these elements by

play04:42

writing code and in this video I'm going

play04:44

to keep it high level just to get an

play04:46

overview of the framework and a feel for

play04:49

the different elements first thing we're

play04:51

going to do is we're going to pip

play04:52

install three libraries we're going to

play04:54

need python.in to manage the environment

play04:56

file with the passwords we're going to

play04:58

install link chain and we're going to

play05:00

install the Pinecone client Pinecone is

play05:03

going to be the vector store we're going

play05:04

to be using in this video in the

play05:06

environment file we need the open AI API

play05:09

key we need the pine cone environment

play05:12

and we need the pine cone API key

play05:15

foreign once you have signed up for a

play05:18

Pinecone account it's free the API keys

play05:21

and the environment name is easy to find

play05:25

same thing is true for openai just go to

play05:28

platform.orgmaili.com account slash API

play05:30

keys

play05:31

let's get started so when you have the

play05:34

keys in an environment file all you have

play05:36

to do is use node.n and find that in to

play05:39

get the keys and now we're ready to go

play05:41

so we're going to start off with the

play05:43

llms or the wrappers around the llms

play05:46

then I'm going to import the open AI

play05:48

Rubber and I'm going to instantiate the

play05:50

text DaVinci 003 completion model and

play05:52

ask it to explain what a large language

play05:54

model is and this is very similar to

play05:56

when you call the open AI API directly

play06:00

next we're going to move over to the

play06:02

chat model so gbt 3.5 and gbt4 are chat

play06:06

models

play06:07

and in order to interact with the chat

play06:09

model through link chain we're going to

play06:11

import a schema consisting of three

play06:13

parts an AI message a human message and

play06:16

a system message

play06:17

and then we're going to import chat open

play06:19

AI the system message is what you use to

play06:22

configure the system when you use a

play06:23

model and the human message is the user

play06:26

message

play06:27

thank you

play06:28

to use the chat model you combine the

play06:31

system message and the human message in

play06:33

a list and then you use that as an input

play06:35

to the chat model

play06:38

here I'm using GPT 3.5 turbo you could

play06:42

have used gpt4 I'm not using that

play06:44

because the open AI service is a little

play06:47

bit Limited at the moment

play06:53

so this works no problem let's move to

play06:55

the next concept which is prompt

play06:58

templates so prompts are what we are

play07:00

going to send to our language model but

play07:02

most of the time these problems are not

play07:04

going to be static they're going to be

play07:06

dynamic they're going to be used in an

play07:07

application and to do that link chain

play07:09

has something called prompt templates

play07:11

and what that allows us to do is to take

play07:13

a piece of text and inject a user input

play07:17

into that text and we can then format

play07:19

The Prompt with the user input and feed

play07:22

that to the language model

play07:25

so this is the most basic example but it

play07:28

allows us to dynamically change the

play07:30

prompt with the user input

play07:40

the third concept we want to Overlook at

play07:42

is the concept of a chain

play07:47

a chain takes a language model and a

play07:49

prompt template and combines them into

play07:51

an interface that takes an input from

play07:53

the user and outputs an answer from the

play07:57

language model sort of like a composite

play07:59

function where the inner function is the

play08:02

prompt template and the outer function

play08:04

is the language model

play08:06

we can also build sequential chains

play08:08

where we have one chain returning an

play08:10

output and then a second chain taking

play08:12

the output from the first chain as an

play08:14

input

play08:16

so here we have the first chain that

play08:18

takes a machine learning concept and

play08:19

gives us a brief explanation of that

play08:21

concept the second chain then takes the

play08:24

description of the first concept and

play08:26

explains it to me like I'm five years

play08:28

old

play08:32

then we simply combine the two chains

play08:34

the first chain called chain and then

play08:36

the second chain called chain two into

play08:39

an overall chain

play08:41

and run that chain

play08:46

and we see that the overall chain

play08:49

returns both the first description of

play08:52

the concept and the explain it to me

play08:55

like I'm 5 explanation of the concept

play08:59

all right let's move on to embeddings

play09:01

and Vector stores but before we do that

play09:03

let me just change the explainer to me

play09:06

like I'm five prompt so that we get a

play09:08

few more words

play09:11

I'm gonna go with 500 Words

play09:19

all right so this is a slightly longer

play09:21

explanation for a five-year-old

play09:27

now what I'm going to do is I'm going to

play09:29

check this text and I'm going to split

play09:31

it into chunks because we want to store

play09:33

it in a vector store in Pinecone

play09:36

and Lang chain has a text bitter tool

play09:38

for that so I'm going to import

play09:40

recursive character text splitter and

play09:43

then I'm going to spit the text into

play09:46

chunks

play09:47

like we talked about in the beginning of

play09:49

the video

play09:53

we can extract the plain text of the

play09:55

individual elements of the list with

play09:57

page content

play09:59

and what we want to do now is we want to

play10:01

turn this into an embedding which is

play10:05

just a vector representation of this

play10:07

text and we can use open ai's embedding

play10:09

model Ada

play10:13

with all my eyes model we can call embed

play10:17

query on the raw text that we just

play10:20

extracted from the chunks of the

play10:23

document and then we get the vector

play10:26

representation of that text or the

play10:28

embedding

play10:29

now we're going to check the chunks of

play10:32

the explanation document and we're going

play10:33

to store the vector representations in

play10:37

pine cone

play10:39

so we'll import the pine cone python

play10:42

client and we'll import pine cone from

play10:45

Lang chain Vector stores and we initiate

play10:47

the pine cone client with the key and

play10:50

the environment that we have in the

play10:52

environment file

play10:54

then we take the variable texts which

play10:56

consists of all the chunks of data we

play10:59

want to store we take the embeddings

play11:00

model and we take an index name and we

play11:02

load those chunks on the embeddings to

play11:05

Pine Cone and once we have the vector

play11:07

stored in Pinecone we can ask questions

play11:09

about the data stored what is magical

play11:12

about an auto encoder and then we can do

play11:15

a similarity search in Pinecone to get

play11:18

the answer or to extract all the

play11:20

relevant chunks

play11:24

if we head over to Pine Cone we can see

play11:26

that the index is here we can click on

play11:30

it and inspect it

play11:32

check the index info we have a total of

play11:35

13 vectors in the vector store

play11:42

all right so the last thing we're going

play11:44

to do is we're going to have a brief

play11:45

look at the concept of an agent

play11:48

now if you head over to open AI chat GPT

play11:51

plugins page you can see that they're

play11:54

showcasing a python code interpreter

play11:58

now we can actually do something similar

play12:00

in langtune

play12:02

so here I'm importing the create python

play12:04

agent as well as the python Rebel tool

play12:06

and the python webble from nankchain

play12:09

then we instantiate a python agent

play12:11

executor

play12:12

using an open AI language model

play12:16

and this allows us to having the

play12:17

language model run python code

play12:19

so here I want to find the roots of a

play12:22

quadratic function and we see that the

play12:24

agent executor is using numpy roots to

play12:27

find the roots of this quadratic

play12:30

function

play12:30

alright so this video was meant to give

play12:32

you a brief introduction to the Core

play12:34

Concepts of langchain if you want to

play12:37

follow along for a deep dive into the

play12:39

concepts hit subscribe thanks for

play12:41

watching

Rate This

5.0 / 5 (0 votes)

相关标签
LangChainAI语言模型数据集成智能应用Python框架GPT-4自然语言处理机器学习向量数据库嵌入式表示编程自动化