Building a RAG application using open-source models (Asking questions from a PDF using Llama2)

Underfitted
9 Mar 202453:15

Summary

TLDREste video ofrece una guía detallada para ejecutar modelos locales de LLM en tu computadora, centrándose en modelos de código abierto como alternativas económicas y privadas a GPT. El presentador explica la importancia de comprender y utilizar LLM locales, destacando su accesibilidad, coste reducido y privacidad. Además, demuestra cómo construir un sistema de generación aumentado por recuperación (RAG), utilizando modelos locales para responder preguntas desde un PDF, subrayando el razonamiento detrás del código más que el código mismo. Este contenido es valioso para quienes buscan desplegar modelos de IA en escenarios sin conectividad o como respaldo a modelos como OpenAI.

Takeaways

  • 🌟 Para ejecutar un modelo LLM local en tu computadora, se utilizan modelos open source que son accesibles y de bajo costo.
  • 🛠️ Los modelos open source son importantes para la privacidad, permitiendo que las empresas manejen sus datos internamente sin conectarse a APIs externas.
  • 🔄 La razón clave es entender la lógica detrás del código y no solo el código en sí, lo que se busca transmitir en el video.
  • 📚 El proceso comienza con una exploración de la herramienta AMA, que sirve como envoltura común para diferentes modelos.
  • 🔗 Se pueden descargar modelos como Lama 2, mol mixol, etc., a través de la plataforma AMA.
  • 📋 La instalación de AMA es simple y disponible para diferentes sistemas operativos como Mac, Linux y Windows.
  • 📈 Los modelos LLM son como fórmulas matemáticas gigantes, compuestos por parámetros de pesos y sesgos.
  • 💻 Al descargar un modelo, se obtienen todos estos valores y se almacenan en el disco duro.
  • 🔍 Se utiliza L chain para construir un sistema simple de rack desde cero, utilizando el modelo local para responder preguntas de un PDF.
  • 🔖 Se crea un ambiente virtual en Visual Studio Code para instalar las librerías necesarias sin afectar el sistema principal.
  • 📊 Se demuestra cómo se pueden obtener respuestas de un modelo local y cómo se puede manejar la información de un PDF para responder preguntas específicas.

Q & A

  • ¿Por qué es importante saber cómo usar un LLM local?

    -Es importante por varias razones: los modelos de código abierto están mejorando, son más económicos que los GPT para ciertos casos de uso, ofrecen ventajas de privacidad al no necesitar conectarse a una API externa, y son útiles en escenarios sin conectividad, como en robótica o dispositivos edge.

  • ¿Qué beneficios ofrece el uso de modelos de código abierto en comparación con los modelos GPT?

    -Los modelos de código abierto son más económicos y ofrecen soluciones eficaces para ciertos casos de uso sin requerir la potencia completa de los modelos GPT, lo que los hace especialmente valiosos para empresas preocupadas por la privacidad o para aplicaciones en entornos sin acceso a internet.

  • ¿Cómo se puede utilizar un modelo de código abierto como respaldo de los modelos de OpenAI?

    -Se puede configurar un modelo de código abierto para actuar como respaldo de los modelos de OpenAI, de manera que si la API de OpenAI sufre una caída, se puede cambiar inmediatamente al modelo de código abierto y mantener el flujo de trabajo sin interrupciones.

  • ¿Qué es AMA y para qué sirve en el contexto de ejecutar un LLM localmente?

    -AMA es un envoltorio común que permite ejecutar diferentes modelos de LLM, como Lama 2 o Mixol, localmente en un ordenador. Facilita la instalación y la ejecución de estos modelos proporcionando una interfaz común para su manejo.

  • ¿Cuál es la diferencia principal entre los modelos de chat y los modelos de completamiento mencionados en el guion?

    -Los modelos de chat están diseñados para conversaciones, devolviendo estructuras especiales para mensajes de IA y humanos, mientras que los modelos de completamiento generan respuestas directas en forma de texto sin estructuras adicionales, enfocándose en completar un texto basado en un prompt dado.

  • ¿Qué papel juegan los embeddings en los sistemas de generación aumentada por recuperación (RAG)?

    -Los embeddings permiten convertir documentos en representaciones vectoriales para compararlos con las preguntas de los usuarios. Esto facilita la identificación de las partes del documento más relevantes para responder a una pregunta, mejorando la precisión de las respuestas generadas por el modelo.

  • ¿Cómo se utiliza L chain para construir un sistema RAG simple?

    -L chain se utiliza para encadenar diferentes componentes, como cargadores de documentos, generadores de prompts, modelos de LLM y analizadores de salida, para crear un flujo de trabajo que permita al modelo responder preguntas basadas en el contenido de un documento cargado, como un PDF.

  • ¿Qué ventajas ofrece el enfoque de usar una cadena (chain) en la construcción de sistemas RAG?

    -El enfoque de cadena permite una modularidad y reutilización de componentes, facilitando la construcción de sistemas complejos al conectar diferentes piezas de funcionalidad de manera flexible y eficiente, y permitiendo ajustes y optimizaciones sin alterar el sistema en su conjunto.

  • ¿Por qué es relevante la funcionalidad de 'streaming' y 'batching' al usar LLMs?

    -El 'streaming' permite recibir respuestas en tiempo real a medida que el modelo las genera, mejorando la interactividad, mientras que el 'batching' permite procesar múltiples preguntas en paralelo, aumentando la eficiencia y reduciendo el tiempo de respuesta total.

  • ¿Cuáles son los desafíos al utilizar modelos de LLM de código abierto en comparación con los modelos GPT de OpenAI?

    -Los modelos de código abierto pueden no ser tan avanzados o precisos como los modelos GPT de OpenAI, lo que puede resultar en respuestas menos precisas o relevantes. Además, la implementación y mantenimiento local de estos modelos pueden requerir recursos adicionales y conocimientos técnicos.

Outlines

00:00

🚀 Introducción a LLM locales y sistemas RAG

El presentador introduce la importancia de correr modelos de lenguaje de código abierto (LLM) localmente, destacando tres beneficios clave: costo-eficiencia comparado con modelos como GPT, razones de privacidad para empresas que prefieren no usar APIs externas, y la utilidad en aplicaciones sin conexión a internet como dispositivos de borde y robótica. Adicionalmente, se menciona el uso de modelos abiertos como respaldo en caso de caída de servicios como el de OpenAI. El objetivo es enseñar a construir un sistema de generación aumentada por recuperación (RAG) usando LLM locales, empezando desde cero y enfatizando la lógica detrás del código más que el código en sí.

05:01

🛠 Preparación y Configuración Inicial

El vídeo procede a guiar sobre cómo preparar el ambiente para correr un LLM localmente, comenzando con la descarga de AMA, un proyecto que permite la ejecución de modelos de código abierto en la computadora del usuario. Se describe el proceso de instalación de AMA y cómo este permite acceder a una amplia gama de modelos LLM, como Lama 2 y Mixol, y se detalla cómo descargar y gestionar estos modelos. A través del uso de la línea de comandos, se muestra cómo iniciar un modelo y realizar consultas básicas, estableciendo la base para el desarrollo de aplicaciones más complejas.

10:02

📝 Creando el Entorno de Desarrollo

El narrador continúa con la configuración del entorno de desarrollo, utilizando Jupyter notebooks dentro de Visual Studio Code para escribir el código Python necesario. Se enfoca en la creación de un entorno virtual para manejar las dependencias del proyecto de manera aislada, y se ilustra cómo instalar paquetes y gestionar variables de entorno, como la clave de API de OpenAI. Este paso prepara el terreno para interactuar programáticamente con modelos LLM, tanto locales como en la nube, y establece las bases para construir el sistema RAG.

15:03

🤖 Accediendo y Ejecutando Modelos LLM

Se explica cómo ejecutar consultas a modelos LLM mediante código, primero utilizando el modelo GPT-3.5 de OpenAI y luego modelos locales como Lama 2. Se introducen conceptos como los de chat y modelos de completado, y cómo adaptar el código para trabajar con ambos tipos. También se discute la importancia de los parsers y cómo estos pueden ser utilizados para formatear las respuestas de los modelos, facilitando la integración de diversos LLM en un único flujo de trabajo.

20:05

🔍 Integración de Modelos LLM con PDFs

El video muestra cómo construir un sistema capaz de responder preguntas utilizando como contexto el contenido de un PDF. Se utiliza un ejemplo práctico donde el presentador guarda una página web como PDF para usarla como fuente de información. Se introducen herramientas para cargar y procesar el PDF en memoria, y cómo utilizar plantillas de prompts para formular preguntas a los modelos LLM basándose en el contenido del PDF. Este paso es crucial para el funcionamiento del sistema RAG, permitiendo la interacción dinámica con información estructurada en documentos.

25:07

📊 Utilizando Vector Stores para Mejorar Respuestas

Este segmento aborda la optimización del sistema RAG a través del uso de Vector Stores, que permiten almacenar y recuperar eficientemente páginas de documentos basadas en su relevancia para una pregunta dada. Se explica el concepto de embeddings y cómo estos facilitan la comparación de la similitud entre el texto de las preguntas y el contenido del documento. El presentador ilustra cómo configurar un Vector Store en memoria y utilizarlo para seleccionar los fragmentos más pertinentes del PDF antes de hacer una consulta al modelo LLM, mejorando significativamente la precisión de las respuestas.

30:09

🔧 Configurando y Probando el Sistema RAG Completo

El último tramo del video detalla cómo ensamblar todos los componentes previamente introducidos para construir y probar el sistema RAG completo. Se muestra cómo integrar el Vector Store con el sistema de prompts y cómo pasar las preguntas a través de este sistema para obtener respuestas basadas en el contenido del PDF. El presentador realiza pruebas en vivo, comparando las respuestas generadas por diferentes modelos LLM a las mismas preguntas para demostrar la funcionalidad y flexibilidad del sistema. También se tocan temas como la importancia de afinar los prompts y la posibilidad de utilizar el sistema en aplicaciones prácticas.

Mindmap

Keywords

💡Modelo de Lenguaje Local (LLM)

Un Modelo de Lenguaje Local (LLM, por sus siglas en inglés) se refiere a un modelo de inteligencia artificial que se ejecuta en el hardware local del usuario, en lugar de utilizar servicios basados en la nube. Este enfoque permite a los usuarios tener un mayor control sobre la privacidad de los datos y reducir costos de operación. En el video, se enfatiza la importancia de saber cómo usar LLMs locales para tareas específicas sin depender totalmente de soluciones en la nube como GPT, mencionando ventajas como la reducción de costos y el aumento de la privacidad.

💡Generación Aumentada por Recuperación (RAG)

La Generación Aumentada por Recuperación es una técnica en el procesamiento del lenguaje natural que combina la recuperación de información relevante con la generación de texto. Utiliza datos recuperados como contexto para generar respuestas más precisas y relevantes. En el guion, se planea construir un sistema RAG simple para demostrar cómo los LLMs locales pueden integrarse en aplicaciones prácticas, como responder preguntas a partir de un PDF.

💡Modelos de código abierto

Los modelos de código abierto se refieren a modelos de inteligencia artificial cuyo código fuente es accesible y modificable por cualquier persona. Son una alternativa económica y flexible a modelos propietarios como GPT. En el video, se discute el uso de estos modelos como una opción rentable y privada para implementar sistemas de IA sin depender de APIs externas.

💡AMA

AMA es presentado como el proyecto o herramienta que permite ejecutar modelos de lenguaje abiertos en un entorno local. Actúa como un envoltorio común para diferentes modelos, facilitando su instalación y uso. En el contexto del video, AMA es crucial para descargar y ejecutar modelos como Llama 2, ofreciendo una interfaz unificada para trabajar con varios modelos de IA.

💡Parámetros

Los parámetros en un modelo de IA son los valores de los pesos y sesgos ajustados durante el entrenamiento para minimizar el error en las predicciones del modelo. Representan el 'aprendizaje' del modelo. El guion menciona que un modelo como Lama 2 tiene 7 mil millones de parámetros, destacando la complejidad y el tamaño de estos modelos de lenguaje.

💡Interfaz de línea de comandos

Una interfaz de línea de comandos (CLI, por sus siglas en inglés) permite a los usuarios interactuar con programas y operaciones del sistema operativo mediante la introducción de comandos de texto. En el video, se utiliza la CLI para ejecutar comandos que instalan y gestionan modelos de LLM locales a través de AMA, demostrando cómo los usuarios pueden controlar estos modelos directamente desde la terminal.

💡Embeddings

Los embeddings son representaciones vectoriales de alta dimensión de texto, palabras o documentos, que capturan el contexto y el significado de los mismos. Son fundamentales en tareas de procesamiento de lenguaje natural. En el video, se utilizan embeddings para comparar la similitud entre el contenido de documentos PDF y preguntas, facilitando la recuperación de información relevante para la generación de respuestas.

💡Almacenamiento Vectorial

El almacenamiento vectorial se refiere a bases de datos especializadas en almacenar y gestionar embeddings de manera eficiente. Permiten realizar búsquedas basadas en similitud. En el guion, se menciona la creación de un almacenamiento vectorial en memoria para guardar los embeddings de las páginas de un PDF, lo que luego se usa para recuperar la información más relevante al responder preguntas.

💡Plantillas de Prompts

Las plantillas de prompts se utilizan para estructurar y personalizar la entrada a modelos de generación de lenguaje, guiando al modelo para generar respuestas específicas. En el video, se crea una plantilla de prompt para indicar al modelo que responda preguntas basadas en el contexto proporcionado y no en su entrenamiento previo, demostrando cómo se puede controlar la generación de texto de manera más precisa.

💡Cadenas de Lenguaje (Lang Chains)

Las cadenas de lenguaje son secuencias de procesamiento que encadenan múltiples componentes, como la recuperación de información, el procesamiento de lenguaje natural y la generación de texto, para realizar tareas complejas. En el guion, se construye una cadena que incluye la carga de documentos, la generación de embeddings, la recuperación basada en similitud, y la generación de texto, ilustrando un flujo de trabajo integral para sistemas de IA basados en texto.

Highlights

如何在你的电脑上运行一个本地的大型语言模型(LLM),并构建一个完整的检索增强型生成系统。

使用开源模型的重要性,因为它们在解决大多数问题上非常有效,而且成本更低。

开源模型对于隐私保护的优势,允许企业将所有数据保留在本地,不连接到外部API。

即使在使用OpenAI作为主要模型的情况下,开源模型也可以作为备份,以保证工作流程的持续运行。

介绍了AMA项目,它是一个允许在本地计算机上运行开源模型的通用封装器。

下载和安装AMA的简单步骤,以及如何通过AMA下载和运行不同的模型。

解释了大型语言模型的本质是庞大的数学公式,由数十亿个参数组成。

展示了如何使用命令行工具Ama来拉取和运行Llama模型,并展示了如何通过Ama llama命令与模型交互。

讨论了创建一个简单的基于L链的系统,从PDF文件中获取数据并回答问题的过程。

介绍了如何在Visual Studio Code中创建一个新的目录和Jupyter笔记本,以及如何设置虚拟环境来安装必要的库。

解释了如何使用环境文件来存储和使用OpenAI密钥,以及如何通过L链访问和使用该密钥。

展示了如何使用L链创建一个提示模板,以及如何将PDF文档的内容作为上下文传递给模型。

讨论了如何使用DocArray和Pantic来创建和操作文档的嵌入表示,以及如何使用这些嵌入表示来检索最相关的文档。

介绍了如何构建一个L链,包括提示、模型、解析器和检索器,并如何使用这个链来回答关于特定文档内容的问题。

展示了如何使用L链的流式和批处理功能来提高与大型语言模型交互的效率和速度。

Transcripts

play00:01

Hey so uh today I'm going to show you

play00:03

how to run a local llm on your computer

play00:07

and how to build an entire rack system

play00:10

retrieval augmented generation system uh

play00:13

using those locally running llms and

play00:17

these are going to be open source models

play00:18

that we're going to use here is the

play00:20

reason why this is crucial right so even

play00:24

though uh the GPT models are so far uh

play00:28

the best at solving most problems that

play00:30

we want them uh we want them for uh

play00:33

knowing how to use a local llm it's very

play00:36

important first because these open

play00:38

source models are getting really really

play00:40

good number two because they're cheaper

play00:43

and you don't need all of the Power from

play00:46

GPT to solve certain use cases so if you

play00:48

know how to use a local uh model now you

play00:52

can do the same task at the same level

play00:54

of quality for much uh cheaper number

play00:57

three for privacy reasons so many many

play01:00

companies do not want to use the GPT

play01:03

models they don't want to connect to an

play01:04

external API they want to have

play01:07

everything in house and that's what an

play01:08

open source model will give you also if

play01:11

you are planning to deploy one of these

play01:13

models into a scenario where you don't

play01:15

have connectivity let's say robotics

play01:18

right or an edge device uh there is no

play01:21

chance for you to just connect to an API

play01:23

you will have to use an open source

play01:25

model so those are some of the reasons

play01:28

there is one that's also my favorite one

play01:30

that is even if you're using open AI as

play01:32

your primary model you can use one of

play01:35

these open source models as a backup so

play01:37

if the open a API goes down the other

play01:40

day they had downtime for a day I think

play01:43

the model was just returning nonsense

play01:45

you can immediately flip to an open-

play01:47

Source model and keep your workflow

play01:49

running with no disruption right so many

play01:52

many reasons uh the the goal of today's

play01:55

video is to show you how to do that on

play01:57

your computer starting from nothing so

play01:59

I'm going to start with an open browser

play02:02

I'm going to do that I'm going to make

play02:03

it run I'm going to build that simple

play02:06

very simple rack system uh we're going

play02:08

to read or answer questions from a PDF

play02:11

so I'm going to get a website download

play02:13

it as a PDF and answer questions from

play02:15

that and the most important thing is

play02:17

that here the thing that matters the

play02:21

least is the code that I'm going to

play02:23

write that is not what's important about

play02:26

this anyone can write the code the thing

play02:29

that matters the most is the reasoning

play02:32

behind that code why do we need to do

play02:35

this or that right that is what I want

play02:37

to convey on this video and I hope

play02:40

that's what you get uh out of it right

play02:42

the understanding of the stuff that

play02:44

we're building here and the reasoning

play02:46

behind it all right so uh with no uh

play02:50

finishing that introduction let me get

play02:52

here to my browser and I opened we're

play02:55

going to start here very simple I opened

play02:58

uh this website is AMA so AMA is the

play03:01

project that is going to allow us to run

play03:04

an open- Source model in our computer

play03:07

okay so here's the thing here's the

play03:09

thing that I want you to know like when

play03:11

you think about a model I want you to

play03:14

think about a like a gigantic

play03:16

mathematical formula because that's what

play03:19

it boils down to it's just a bunch of

play03:21

values weights and biases we call them

play03:24

parameters and they're put together in a

play03:27

gigantic mathematical formula that is

play03:29

huge like when we talk about Lama 2 the

play03:33

7B model that 7B means 7 billion

play03:39

parameters so it refers to the number of

play03:42

weights and biases that we need to store

play03:46

and execute uh in order to make any

play03:48

prediction uh Lama

play03:51

270b that's 70 billion parameters so

play03:55

when we download one of these models

play03:57

what we are downloading is all of those

play04:00

values all of those parameters the 70

play04:03

billion parameters we're storing that in

play04:05

our our hard drive and we're storing uh

play04:08

some sort of like a instructions on how

play04:11

to put together those parameters and run

play04:14

them as a big mathematical formula

play04:16

that's basically what we are storing so

play04:18

AMA is going to serve as the common

play04:22

wrapper around all of these different

play04:24

models so we can download Lama 2 but we

play04:27

can also download uh mix straw and we

play04:30

can run them through AMA through this

play04:33

common interface so to install AMA is

play04:36

very simple go to this website ama.com

play04:39

they have versions for Mac Linux and

play04:43

they just release a preview for Windows

play04:46

so even if you're on Windows you can run

play04:49

uh AMA on your computer so I already

play04:53

downloaded it uh it's just it's very

play04:55

thin it's just it's very small uh when

play04:58

you download it you can run it and you

play05:00

can see here on my status bar uh you can

play05:04

see there is a little llama right there

play05:07

that tells me that a Lama is running on

play05:10

my computer by the way the first time

play05:12

you run a llama it's going to ask you to

play05:15

uh to install the command line tools I

play05:18

think that's what it does or to download

play05:20

the Llama 2 model I don't remember what

play05:21

is the first instruction they give you

play05:23

but it's very simple to navigate uh and

play05:25

it's very quick so there is here here in

play05:28

the website uh you're going to see a

play05:30

link it's called models so click on that

play05:34

and here is the list of all of the

play05:36

models that all you can run using a Lama

play05:39

okay so you have the Gemma models that

play05:42

just Google uh released I think it was a

play05:44

week ago this was updated two days ago

play05:47

Jesus it's is so fast uh you get llama 2

play05:51

mol mixol with an x uh lava I mean just

play05:55

goes on and on and on if you care about

play05:57

code you have code Lama here so all of

play06:00

these models you can download now how do

play06:02

you do that I mean you can obviously you

play06:05

can go here inside one of these models

play06:08

and see the entire family you see the 7B

play06:11

version the 70b version the chat model

play06:14

you can you know you can explore this on

play06:16

your own you're going to get information

play06:18

about your your model how to use it Etc

play06:21

uh this is what I'm going to do it's

play06:23

going to be very very simple so I'm

play06:25

going to go to my command line and

play06:28

you're going to have Ama as the uh as a

play06:31

command after you install it and here

play06:34

you can see a bunch of available

play06:36

commands here you get the pull command

play06:39

so basically you can say Lama pull Lama

play06:43

2 and that will download the latest

play06:46

version of Lama 2 to your computer okay

play06:50

I already did that so I'm not going to

play06:51

do it here plus it takes I mean it's

play06:53

downloading I think it's like 20

play06:54

Gigabytes of data so it's going to take

play06:57

a few a few minutes to do uh you can do

play07:00

a llama list and that will tell you what

play07:04

are the models that I have installed

play07:06

here on my computer so I have the latest

play07:08

version of Lama 2 uh I also have

play07:11

mixol the a *

play07:14

7B uh version and I have the latest

play07:17

version of mixol okay and you can see

play07:20

here the ID you can see the size 26 gabt

play07:25

mixol Lama 2 is only 3 GB and when I was

play07:29

modified I have here just to show you

play07:33

when you download a model uh that is

play07:36

going to get I don't know how big this

play07:39

is on the on the recording but when you

play07:41

download a model uh it basically

play07:44

downloads these files within a folder

play07:47

I'm on a Mac here so it will download uh

play07:51

within my my base directory there is

play07:54

aama directory and it will put in there

play07:58

every file that I downloads and you can

play08:00

see like some of these like this one

play08:02

here is the mixol one see the 26 gab

play08:06

that is just basically all of the

play08:08

parameters are stored there after the

play08:10

latest training version of mix trol so

play08:13

all of those values are going to be

play08:15

store here so just so you know if you

play08:17

want to delete them uh where to come you

play08:18

can also obviously delete them using the

play08:21

RM command here so now that I have this

play08:25

I think I can do I don't remember the

play08:26

command but I'm going to do Ama

play08:30

llama 2 is not like that so how do I

play08:32

serve this oh there we go maybe serve

play08:36

let's see show run ah let's do

play08:39

run there we go so now we're running

play08:42

Lama 2 here on my

play08:45

computer okay and now I can say tell me

play08:48

a

play08:50

joke and there we go sure here is one we

play08:54

why don't scientists trust um this is

play08:58

just a bad joke I'm sorry sorry so

play08:59

anyway here is the model running on my

play09:02

computer which is awesome uh if I do

play09:05

this this is going to give me the help

play09:07

uh let's just say buy okay awesome so

play09:09

now I have an open-source model running

play09:13

on my computer from the command line

play09:16

that's great but that's not what I want

play09:18

what I want is to be able to access this

play09:20

model programmatically so in order to do

play09:22

that I'm going to create a directory

play09:25

here and we're going to build a very

play09:27

simple rack system from zero from

play09:30

scratch using L chain to get data from a

play09:34

PDF file okay so that's what I want to

play09:35

do all right so I'm going to create a

play09:37

directory I'm going to call it uh I

play09:40

don't know local model let's call it

play09:43

local model okay and let's

play09:48

go open Visual Studio code on that

play09:53

directory there we go local

play09:55

model okay awesome so here's visual stud

play09:59

Studio code I'm going to make it nice

play10:02

and big and beautiful so you guys can

play10:05

see and I'm going to create a notebook

play10:07

I'm going to call it

play10:10

notebook and this is a Jupiter notebook

play10:13

uh just so you know in order for you to

play10:15

be able to use Jupiter with a visual

play10:18

studio code you need to install the

play10:20

Jupiter plugin uh this plugin is created

play10:23

by Microsoft and obviously the python

play10:25

plugin because this is going to be

play10:27

python but yeah I'm going to be using

play10:29

Jupiter from within my notebook here uh

play10:32

awesome I'm going to open a terminal

play10:35

window within Jupiter and I'm going to

play10:37

create a virtual environment so I can

play10:39

install all of the libraries that we're

play10:41

going to need uh within that virtual

play10:44

environment I don't want to be

play10:45

installing anything directly on my

play10:46

computer so from within uh Jupiter

play10:49

notebook here from within the terminal

play10:51

I'm going to do Python 3 to just call u

play10:55

a module and the module is vm. the

play10:59

virtual environment module that comes

play11:01

with python and I'm going to call

play11:03

itvm like that's the name of the folder

play11:05

that is going to create so I'm going to

play11:08

do that and that is going to execute and

play11:10

now I have a new folder here is a hidden

play11:12

folder but it's going to be a new folder

play11:14

that's going to be called VM uh many

play11:17

people use different virtual

play11:19

environments they use poetry and they

play11:21

use cond Mina I'm an old school guy I'm

play11:24

want to I want to keep it very simple

play11:26

that's why I'm using the virtual

play11:28

environment here here all right so let

play11:30

me just uh go inside that virtual

play11:34

environment cool and now I can start

play11:37

installing stuff okay so what do I need

play11:40

here uh well I I don't know still what I

play11:43

need but let's first start by just doing

play11:47

something from this notebook just making

play11:49

sure this notebook is actually working

play11:51

it tells me hey you need to select a

play11:54

kernel what is the kernel I'm going to

play11:56

go to the python environment that I just

play11:58

created it I'm going to select that

play12:00

Visual Studio is going to tell me that

play12:02

it's going to install everything that it

play12:04

needs to execute that print line and

play12:08

boom it runs so this is awesome this is

play12:11

working okay so this is something that

play12:13

we are going to need uh here I'm going

play12:17

to need an environment file so this

play12:19

environment file is just going to store

play12:23

any environment variables that I'm going

play12:25

to use during this presentation so for

play12:27

this presentation

play12:29

uh we're going to need the open AI key

play12:33

we're going to need to pass that uh

play12:35

obviously I don't have the value here

play12:37

but this is going to be something like

play12:39

blah and I'm going to store it there and

play12:41

then I want to read that key from my

play12:46

code from my notebook so I can get

play12:48

access to the open AI key why do I need

play12:51

access to the open a API because I want

play12:53

to test everything that I'm doing

play12:54

locally also with GPT just to make sure

play12:58

how they compare so that's why I'm going

play12:59

to access the the open AI key so I have

play13:03

my key over here I'm going to paste it

play13:08

uh maybe in order to do that I don't

play13:10

want to just put it on the on the

play13:14

screen so you guys don't just use my key

play13:17

to build your applications for

play13:23

free

play13:25

okay I know you guys are not seeing what

play13:28

I'm doing but basically just pasting my

play13:30

open AI

play13:32

key uh off screen so I know I could do

play13:36

it here and then just change it but

play13:39

anyway so after doing that this is the

play13:42

stuff that I'm going to do so I'm

play13:44

importing the OS library and I'm

play13:47

importing this Library uh it's called DM

play13:50

and that's going to read the environment

play13:52

variables that I just store in the data

play13:55

end file it's going to read that in

play13:57

memory uh when I call this load. m and

play14:00

now I can use my open AI key just like

play14:03

this it's just very simple now in order

play14:05

for me to be able to use this I need to

play14:08

install uh it's called python. M so I

play14:11

need to install that Library boom

play14:14

awesome and then I'm defining here three

play14:18

it's the same variable just overwriting

play14:20

in shorter I'm going to start by using

play14:22

the GPT

play14:24

3.5 I'm going to start by using this

play14:26

just a variable boom that works

play14:29

okay so now I'm ready to just using L

play14:31

chain uh create a very simple model just

play14:34

to make sure that the API is working

play14:37

okay so I'm going to be copying the code

play14:39

here uh let's see so how do we how do we

play14:43

access this I'm going to import the

play14:45

chat. openai model and yeah thanks to to

play14:49

copilot I'm going to create model chat.

play14:52

open AI I'm going to pass the open AI

play14:54

key I'm going to pass the definition of

play14:56

the model which is going to be or the

play14:57

name of the model which is going to be

play14:59

GPT 3.5 and now that I have here I can

play15:03

just to invoke and I can pass hey tell

play15:06

me a joke okay so I need to install L

play15:11

chain open AI of

play15:14

course I want to do

play15:17

that uh what else I need to install more

play15:20

stuff

play15:22

uh this might have been installed but

play15:24

just in case I'm going to install the L

play15:27

chain as well

play15:29

yep it wasn't okay awesome so now can I

play15:33

run

play15:34

this boom obviously this is beautiful

play15:38

okay so here we go I asked the open a

play15:42

API tell me a joke and I get back why

play15:44

couldn't the lower I get back a response

play15:47

from open AI that is awesome but this is

play15:50

just ch GPT or the GPT model we don't

play15:55

want that we want to do it from the

play15:57

locally uh running mod model so how do

play15:59

we do that it's very simple we need to

play16:01

use the instead of using the

play16:04

specifically the open AI chat model we

play16:06

need to use anama model and I think

play16:09

that's on a different Library uh might

play16:11

have been I'm not sure I think it's in

play16:14

the L chain Community

play16:16

Library uh let's

play16:18

see let's import it here and we're going

play16:21

to get this class which is AMA and what

play16:25

I I want to do is I want to instantiate

play16:29

anama model if if the model is not a GPT

play16:34

model so I'm going to do something like

play16:36

this if model sorry if

play16:39

model uh starts with not not like this

play16:43

but the other way around GPT then

play16:47

do this right

play16:50

else okay let's do an AMA model uh by

play16:54

the way I I can take this

play16:57

out and do it

play16:59

regardless okay so this is awesome so

play17:02

basically if I Define that my model is a

play17:05

GPT then I want to instan shate model

play17:09

using the chat open AI model if not I

play17:12

want to instan shate the model using the

play17:14

class or Lama passing the name of the

play17:16

model in this case uh I'm going to test

play17:19

this just using

play17:20

GPT let's see so that should work like

play17:22

it did before it does now I'm going to

play17:25

change the name of the model let's do

play17:28

Lama too so I'm going to change the name

play17:30

of the model and there we go you get an

play17:34

answer back from llama notice something

play17:37

interesting here notice that when I call

play17:39

the Llama model I get back a string

play17:43

that's what I get back from the Llama 2

play17:45

model but when I call the GPT

play17:48

model I get an AI message instance here

play17:53

that has the content inside so the

play17:55

reason that's happening is because the

play17:57

GPT turbo model model this model here is

play18:00

what we call a chat model so it's a

play18:02

model that's meant for a conversation

play18:05

right they're going to be AI messages

play18:08

which is what I get here they're also

play18:10

going to be human messages which is like

play18:14

the questions that I'm going to ask that

play18:16

model for example they're going to be

play18:17

classified as human messages so when I

play18:20

interact with that class Lan chain would

play18:24

return a special structure in this case

play18:25

is an AI message instance containing the

play18:28

content inside so but the Lama 2 model

play18:31

is a completion model it's not a chat

play18:33

model I could be using a chat model as

play18:36

well but I'm not I'm using a completion

play18:38

model and that's why what you get back

play18:40

is a string so how do we fix this uh or

play18:43

just it's not a problem that we need to

play18:45

fix but I don't like to see an AI

play18:47

message here well we can use l chain to

play18:49

parse out that AI message just remove

play18:54

that and turn it into a string right so

play18:56

L chain supports the concept concept of

play18:58

parsers and I have uh I have here what

play19:02

we need to do and you're going to see

play19:03

how simple it is so let me paste it here

play19:07

so I'm going to import a string output

play19:10

parser and a parser again it's just a

play19:12

class that's going to take an input and

play19:15

it's going to transform it in one way in

play19:17

this case this one here is going to

play19:19

transform the input into a string which

play19:21

is what we want and I'm going to create

play19:24

my first L chain chain here Lang chain

play19:28

is language chain I'm going to create a

play19:30

chain here uh that's where the name

play19:32

comes from so my chain is going to take

play19:35

the model and I want to pipe the output

play19:38

of that model into the input of the next

play19:42

component in this case it's a parser so

play19:44

if I do this basically what's going to

play19:47

happen is that Lan chain will talk to

play19:49

the model will send a request to the

play19:51

model and then we'll get the output from

play19:54

the model and we'll pipe that output

play19:57

into the input of the parer which then

play19:59

will return the string so if I do this

play20:02

I'm going to reexecute this line I get

play20:05

my AI message but I'm going to do it

play20:07

again just now from the chain so I'm

play20:11

invoking the chain not the model anymore

play20:13

I'm invoking the chain so the parser

play20:16

gets involved when I do that boom I just

play20:19

get back the string why because this

play20:21

parser that's the job I'm going to

play20:23

remove the parser invoke it now you get

play20:26

the AI message I put back dep paror

play20:29

invoke it now you get the string see how

play20:31

that works that's beautiful and that is

play20:33

one of the main characteristics of L

play20:35

chain you can put uh you can create

play20:39

increasingly more complex chains using

play20:42

different components okay but this is

play20:43

just the beginning we cannot run a model

play20:46

and that model can run locally I have a

play20:48

llama here and we know how to do a chain

play20:52

uh let me just build a simple rack

play20:54

system just very very simple I want to

play20:56

answer questions from a PD PF so the

play20:58

first question is what is going to be

play21:00

that PDF so I'm going to go to this

play21:02

website here and this is the class that

play21:05

I that I teach I teach a live program is

play21:07

called a machine learning a school

play21:09

program well it's actually called

play21:11

building machine learning uh systems

play21:13

that don't suck and I'm going to save

play21:15

this as a string okay so I'm going to uh

play21:18

go here like if I were to print this I'm

play21:21

going to save my whole website as a PDF

play21:24

so let me save that as a PDF and I'm

play21:27

going to save it the same folder I'm

play21:29

going to put it in the same

play21:31

folder uh let's see where do I put it

play21:34

here this is just it's horrible the

play21:37

dialogue that Apple decided to create to

play21:41

save components I just hate it so much

play21:44

all right I'm going to call it uh ml

play21:47

school which is the the name of the

play21:49

website okay so I'm going to call it ml

play21:51

School boom let's go back here here we

play21:54

go we have our PDF right there so what I

play21:57

want to do now is use my model to answer

play22:00

questions from that PDF the first thing

play22:04

that we need to do is load that PDF in

play22:06

memory so how do we do that we're going

play22:07

to need a library that I have it here

play22:11

somewhere okay the library is uh PP

play22:15

install it's called Pi PDF or python PDF

play22:20

however you want to call it so we need

play22:22

to install that library and then using L

play22:26

chain we can actually load that in

play22:29

memory so we're going to come here and

play22:33

look at this L chain supports document

play22:36

loaders and by the way there are a bunch

play22:38

of different document loaders that you

play22:39

can use to load information from

play22:42

anywhere okay so I'm going to use the P

play22:44

PDF loader that's why I needed the

play22:46

library in the first place I'm going to

play22:49

type what the name of the PDF is going

play22:51

to be here and I'm going to load it and

play22:54

split it and that is very important and

play22:56

then I'm printing out the p pages so you

play22:58

see what the result was so I'm going to

play23:00

execute this and see what just happened

play23:03

so Lan chain using this loader class

play23:07

loaded and splited my PDF into different

play23:10

pages you're going to see uh here it

play23:13

says page one of 14 page two of 14 all

play23:18

the way to page 14 of 14 so he split my

play23:22

entire PDF document into different pages

play23:26

and loaded each one of those pages pages

play23:28

in memory okay so I have 15 pages or 14

play23:31

pages in memory right now that's awesome

play23:35

that was a great great step I have that

play23:37

in memory here the next thing is

play23:40

creating a template so I need to create

play23:42

a template I want I say a template it's

play23:44

a prompt template to ask the model to

play23:48

answer questions from specific context

play23:52

so let's do that uh and let's do it the

play23:55

right way so look at this by the way all

play23:58

of these stuff all of the steps that I'm

play24:00

covering here talking about retrieval

play24:03

augmented generation systems I covered

play24:06

in much more details in a different

play24:08

video I'm going to put it somewhere here

play24:12

uh or maybe the description below or if

play24:13

not you can find the latest video in my

play24:16

in my YouTube channel and you're going

play24:17

to see more detail about these steps

play24:20

here so okay so here is a promp template

play24:23

that we are going to use to pass to the

play24:25

model so here is a stram it says answer

play24:27

the question based on the context below

play24:30

if you can answer the question reply I

play24:32

don't know okay you can make this comp

play24:34

this more complex if you want it this is

play24:36

good enough for for us then I'm going to

play24:38

provide the context and then I'm going

play24:40

to provide the question that I want to

play24:42

answer I'm basically telling the model

play24:44

the question please answer it using this

play24:47

information don't go to your memory

play24:50

don't go to the stuff that you've

play24:51

learned before just answer out of this

play24:54

section here okay so ideally I'm going

play24:57

to be able to grab some of the pages on

play24:59

that PDF put them here the content of

play25:03

the pages and then answer a question or

play25:05

ask a question about those pages so I'm

play25:07

going to create this prom template

play25:09

notice these two squiggly braces here uh

play25:13

context and question those are variables

play25:15

and L chain will turn them into

play25:17

variables that I can pass and provide

play25:19

values for

play25:20

so here is my prompt I'm using the

play25:23

prompt template a class to create a

play25:26

template from the string that I just

play25:28

passed right here and now I'm just just

play25:31

to test it out I'm calling the format

play25:34

function I'm saying hey format my prompt

play25:36

passing context see how context just

play25:39

became a variable here or an argument to

play25:42

this function here is some context and

play25:44

the question here is a question so I'm

play25:45

going to execute this just to make sure

play25:47

this works and you can see actually let

play25:50

me do a print here let's see if those no

play25:53

lines much better much better like this

play25:56

answer the question based on the context

play25:58

below if you can answer the question

play25:59

reply I don't know and then we say

play26:02

context here's some context and then we

play26:04

say question here is a question okay so

play26:07

that's awesome we have a prompt so how

play26:10

do we pass this prompt into the model

play26:12

well we can just keep building our chain

play26:15

remember that our chain was like

play26:20

this and we were piping that into a

play26:23

parser so we could make this chain

play26:27

better if we do something like this so

play26:32

what happens if we do this we have a

play26:34

chain we start with a prompt and that

play26:37

that prompt is going to go into the

play26:38

model and that that model is going to go

play26:40

into a parer so we can create that the

play26:42

question now is what do you think is

play26:45

going when we say chain invoke what do

play26:47

you think we need to pass well remember

play26:50

there are two variables that we need to

play26:52

provide therefore we're going to have to

play26:55

invoke this chain passing cont context

play26:58

I'm passing a question so if I say

play27:02

uh the name I was

play27:06

given was

play27:09

Santiago and then I'm gonna ask what's

play27:12

my

play27:13

name oops and if I do this your name is

play27:17

Santiago boom that works important

play27:21

lesson here notice how this chain when

play27:25

we invoke the chain we need to

play27:27

understand and what the input of the

play27:29

chain will be now there is something

play27:31

that might help there uh there is an

play27:35

input

play27:36

schema functionality that you can call

play27:39

the chain I mean obviously you can just

play27:41

look at the first component of the chain

play27:43

and if you understand what the first

play27:44

component is waiting for or is is

play27:47

expecting that is the in invocation that

play27:50

is going to need to happen but I find

play27:52

this input

play27:53

schema uh trick or or or tool very

play27:57

helpful because it tells me it gives me

play28:01

information about the chain without me

play28:02

having to overanalyze what that chain

play28:04

looks like so in this case you see okay

play28:07

so this is the chain what is the input

play28:09

schema to that chain and it talks about

play28:11

the prompt input so it's going to be the

play28:12

prompt and it tells me well the

play28:14

properties is expecting an object and

play28:16

the properties of that object are a

play28:19

context okay which is a string and a

play28:23

question which is also a string so these

play28:25

are the two variables context and

play28:28

question see so that is why I know that

play28:31

I need to invoke that chain with the

play28:33

context and the question okay awesome so

play28:36

what do we have right now we have a

play28:38

chain that already has a

play28:40

prompt a model and a parser we have the

play28:44

documents in memory the by the documents

play28:46

I mean the PDF document we already have

play28:48

it in memory split by Pages now we need

play28:51

to find a way to take that document and

play28:54

pass it as a context but only pass

play28:58

the relevant portions of that document

play29:00

as a context so how do we do that well

play29:02

I'm going to use a very simple Vector

play29:04

store that is going to uh do several

play29:07

things for us so number one it's going

play29:10

to save it's going to serve as a

play29:12

database for the content in a different

play29:15

way we're not going to store the pages

play29:17

of the content just straight into the

play29:19

database we are going to be storing

play29:21

embeddings of those documents so we're

play29:23

going to get the whole PDF we're going

play29:25

to generate embeddings for each one of

play29:27

the pages of that PDF and the reason we

play29:30

generate these embeddings is so we can

play29:32

later compare those embeddings with the

play29:35

question that the user is asking and

play29:38

find the embeddings that are the most

play29:40

similar or the pages of the document

play29:43

that are the most similar to the

play29:45

question the user asks and I know I'm

play29:48

wiping my hands here a little bit uh in

play29:50

the video that I mentioned before in the

play29:52

other video which is it's is called

play29:54

building a rack system from scratch on

play29:56

that video I go into a lot of details

play29:59

about how embeddings work uh hopefully

play30:03

by now you know that if not just check

play30:04

that video out because it's going to

play30:06

help you understand what is the reason

play30:09

we create these embeddings the good news

play30:10

is that all of these embeddings are

play30:12

going to be created for us behind the

play30:14

scenes and the doc the uh the vector

play30:18

store in memory is going to help us

play30:20

retrieve the pages that are the most

play30:23

relevant to answer specific questions so

play30:25

how do we do that well first I need to

play30:28

install a couple of libraries here uh

play30:31

the first one is going to be Doc array

play30:34

so I'm going to do pip install do array

play30:38

that is

play30:39

important the second one is a specific

play30:42

version of

play30:46

pantic by way I'm installing all of

play30:48

these by hand in the description and in

play30:51

this YouTube video you're going to find

play30:53

a link to the repo with all of this

play30:55

content so you don't have to do any of

play30:57

the so you don't have to follow through

play30:58

you can just go directly and grab the

play31:00

content including all the libraries that

play31:02

you need to install okay I think that's

play31:05

it no I need something else I need to

play31:10

install this or not uh let's

play31:15

see no this we might not need this or we

play31:20

do I don't know let's see let's see if

play31:21

we need that or

play31:24

not okay

play31:26

so

play31:29

let's create by the way let me just hide

play31:32

here

play31:35

the okay much better I have a little bit

play31:38

more space so here is what I'm going to

play31:42

create now I'm going to use a dock array

play31:44

memory search Okay so this dock array

play31:47

memory search uh this is just going

play31:52

to create a vector story memory now if

play31:55

we were building a real app application

play31:57

we wouldn't be using this Vector storing

play32:00

memory obviously we will be using

play32:02

something that has permanent storage

play32:04

like pine cone or any other Vector

play32:07

database out there but this is good

play32:09

enough for our video here for our

play32:12

purpose this is just going to do the

play32:14

same thing but just in memory here in

play32:15

our computer and the good thing about

play32:17

this dock array in memory Vector store

play32:21

is that we can just load it and create

play32:23

it off of the documents that we

play32:25

generated okay okay so you can see here

play32:28

that I'm passing the pages of the

play32:30

document that we generated from the PDF

play32:33

right remember that we here are the

play32:35

pages not not here these are the pages

play32:38

of the document you can see here pages

play32:40

is just an array with every single page

play32:42

so we can create a vector store directly

play32:45

off of all of those pages what's going

play32:47

to happen is that all of those pages are

play32:49

going to uh go into the database the

play32:51

database is going to generate embeddings

play32:53

for all of those it's going to save all

play32:55

of that in memory there is something

play32:56

else that I need to pass and I need to

play32:59

provide the embeddings class so what

play33:02

class we're going to be using to

play33:05

generate the embeddings here's the thing

play33:07

every model uses a different model to

play33:10

generate embeddings so depending on the

play33:13

model that we create we need to uh

play33:15

generate embeddings one way or the other

play33:18

we have here right

play33:21

now uh either a GPT model or an AMA

play33:25

model therefore or we are going to have

play33:28

to generate embeddings uh in a different

play33:31

way so let's see embeddings let's create

play33:35

a variable uh this is not co-pilot is

play33:38

not being helpful right there I think

play33:41

the open ey embeddings are here okay

play33:45

there we go so we're going to need this

play33:49

class so if if we're using a GPT model

play33:54

the embedding instance is going going to

play33:57

be we're going to instan shate

play33:58

embeddings with the open AI embeddings

play34:00

model and theama

play34:04

one theama ones is going to be this one

play34:09

here

play34:10

so there is an AMA embeddings and that

play34:13

is the one that we want to use

play34:17

if

play34:19

oops this is the one that we want to use

play34:22

if we're

play34:23

using Lama 2 or mixol or any other llama

play34:28

model let me execute that again all

play34:31

right let me go here all of this is good

play34:34

all of this is good here is

play34:37

ooom okay so there is a problem here

play34:40

with the

play34:41

library okay so I have a problem here

play34:43

with the Lou let me restart this just to

play34:46

make sure that is not what's happening I

play34:49

know I remember the first time I

play34:51

installed this that I had issues as well

play34:54

with

play34:55

uh the vector store in memory Vector

play34:59

store uh but no that now it's working

play35:01

fine so now I have a vector store here

play35:04

which is gray and I think I can do

play35:07

something like

play35:09

uh retrieve not tell me a joke but let

play35:13

me retrieve something related

play35:16

to machine learning let me see if this

play35:23

works oh maybe like this there we go

play35:27

okay so if I get to my Vector store and

play35:30

I turn it into a retriever and and a

play35:32

retriever is a component of L chain that

play35:36

will allow you to retrieve information

play35:37

from anywhere so basically here what I'm

play35:40

doing just to put it in a different way

play35:42

that it's it's it's

play35:44

uh a little bit less convoluted I'm

play35:48

going to create a retriever off of the

play35:51

vector store and again you don't need a

play35:53

vector store for a retriever you can

play35:55

create a retriever that that's going to

play35:57

be using Google searches you can create

play35:59

a retriever that's going to get

play36:00

information from anywhere so just in

play36:02

this case it's just going to come from

play36:04

the vector store and then I can invoke

play36:06

my Retriever and then pass information

play36:10

and what's going to happen is that that

play36:13

retriever or that Vector store is going

play36:15

to return the four top documents that

play36:18

are the most relevant to the concept

play36:22

machine learning okay so anything that's

play36:25

relevant to that concept going to come

play36:27

sorted in order of importance back and I

play36:31

think I can say there might be a k here

play36:34

let's do two not here somewhere

play36:37

somewhere there is a actually uh maybe

play36:41

top I don't think is is top K but there

play36:44

is a parameter which I don't remember

play36:45

right now I will have to look at the

play36:47

documentation if you wanted to control

play36:49

how many documents are going to come

play36:51

back you can do that through the

play36:53

retriever I'm not sure what the name is

play36:55

right now it doesn't really matter we

play36:56

can use four all right so now we have a

play36:59

retriever so the idea here is going back

play37:03

to our chain let me copy our

play37:05

chain down here so getting back to my

play37:09

chain I have a prompt I have a model and

play37:12

I have a parser and remember that that

play37:15

prompt is expecting a context and a

play37:18

question okay that's what I need to pass

play37:21

that prompt now the context is g to come

play37:25

from this retriever

play37:27

okay so this is what's going to get a

play37:29

little bit tricky here the prompt is

play37:31

expecting a map so imagine that I do

play37:35

this if I take a map I'm going to create

play37:37

a map here and I'm going to pass a

play37:41

context uh let's see the name what was

play37:44

the the name I was given was Santiago

play37:46

and I pass a question what is my name uh

play37:50

can I do this or not I think I I'm going

play37:53

to need something like

play37:55

this okay

play37:57

can I do

play37:59

invoke doesn't work let's figure this

play38:02

out

play38:10

white okay oh this is it runnable

play38:15

okay can I do this actually nah that's

play38:18

not it yeah I thought that this was

play38:21

going to get uh turned into a runnable

play38:25

directly let's see why this is not

play38:28

working so it says let me let me

play38:31

recreate

play38:33

this I mean I know how to fix the

play38:36

problem I just don't want to fix it like

play38:39

that let me do let me do this uh import

play38:44

operator no sorry from operator

play38:49

import item getter and then let's get an

play38:53

item getter and let's do question and

play38:57

let's pass that to a

play38:59

retriever and then let's do

play39:03

this still doesn't

play39:05

work why is why it doesn't work let me

play39:08

see the documentation here really quick

play39:11

and see why that is not

play39:16

working okay so you know what let me

play39:19

just pass this question here oh

play39:22

well I I understand why that doesn't

play39:25

work I need to pass a question

play39:37

obviously now let's do

play39:43

this now there's something let me just

play39:46

check

play39:48

here okay so I have the

play39:54

retriever okay

play39:57

I have a prompt I have a model I have my

play40:00

parser so that is working

play40:03

fine this looks much better here the

play40:06

question let me gra let me grab the same

play40:08

question that I'm

play40:11

passing okay there we go this is what

play40:14

that was that was the problem all right

play40:16

so I'm going to explain here really

play40:17

quick what was happening here uh because

play40:20

it's not it's not obvious what was

play40:22

happening here all right

play40:23

so I have my prom my model my arer I

play40:27

need to pass the prompt I need to pass a

play40:29

context and I I need to pass a question

play40:32

to my prompt okay so what I want to do

play40:35

here is the context is going to come

play40:37

from a retriever so I'm going to put

play40:39

this here organize it in a different way

play40:42

so it's it's more obvious so the context

play40:45

is going to come from a retriever but

play40:48

that retriever requires the question

play40:52

that I'm invoking this chain with so

play40:55

they have this weird here and there is

play40:56

another way of doing it but we're using

play40:59

here uh something that's called the item

play41:02

getter function and just so you guys

play41:04

understand what the item getter is if I

play41:06

do item getter and I pass

play41:10

uh ABC now I can call that actually

play41:15

compilot is is being helpful

play41:18

here there we go if I do this I create

play41:22

this item getter with ABC I can apply

play41:25

that to a function later that becomes a

play41:28

function that when I call it with a

play41:30

dictionary for example it's going to

play41:31

return one two three so if I execute

play41:34

this you get the one two three here so

play41:36

in this particular case if I if I have

play41:39

item getter of question and I pass it I

play41:42

pass this dictionary here of course what

play41:46

I'm going to get back is just the

play41:48

question right what is machine learning

play41:50

because this is the question okay so

play41:53

this is the way you sort of like put

play41:55

together this chain I'm going to expect

play41:58

here the first module is going to be

play42:00

what they call a runnable so it's

play42:02

basically a component that can run so a

play42:05

runnable here is going to generate a map

play42:08

because that map is what's going to be

play42:09

passing we're going to be passing that

play42:11

map with context and question to The

play42:13

Prompt we're going to generate a map and

play42:15

the first value of the first variable of

play42:18

the map which is context is going to

play42:21

come from the retriever but this is a

play42:23

unit here so I'm basically grabbing the

play42:26

question from the invoke I'm grabbing

play42:29

the question piping that question into

play42:31

the Retriever and the output of that is

play42:34

what's going to go into context and we

play42:36

know what the output of that is going to

play42:38

be because we already did it here you

play42:40

can see here this is my

play42:42

Retriever and I'm piping or or invoking

play42:45

that with a question and I'm going to

play42:47

get an array of documents so this

play42:50

context here is nothing else that this

play42:53

array of documents okay and then

play42:55

question question which is the second

play42:57

value of the map is just the value of

play43:00

the question I'm passing through the

play43:03

question from the invocation I'm passing

play43:06

it through here to the second component

play43:08

and that creates my entire chain so now

play43:12

I need to test this chain and to test

play43:14

this chain I actually have a bunch of

play43:17

questions that I'm going to be using to

play43:19

test this chain and here are my

play43:21

questions what is the purpose of the

play43:23

course how many hours of live sessions

play43:26

how many coding assignments and just a

play43:29

bunch of questions and now I'm going to

play43:31

go one by one and I'm going to answer

play43:33

those questions so this is uh let's

play43:36

actually let's do this for question and

play43:39

questions uh we can invoke the chain

play43:42

saying hey this is my question I'm going

play43:45

to do something else which is I'm going

play43:47

to print here the name of the question

play43:50

so I'm going to do hey this is the

play43:52

actually let me do a printf question I'm

play43:56

going to print the name of the question

play43:58

I'm going to print the answer so let's

play44:01

do let's do something like this

play44:04

answer and then let's put all of this

play44:07

here inside I'm going to need to change

play44:10

this to single quotes and then let's

play44:13

just print a new line and that is going

play44:16

to call let me give it a try what is the

play44:19

purpose of the curse and this is the

play44:24

answer okay let's see let's see this is

play44:26

the GPT model by the way yeah this is

play44:29

fine how many hours the program offers

play44:32

18 hours of live interactive sessions

play44:34

that is correct how many coding

play44:36

assignments there are 30 coding

play44:38

assignments that is correct is there a

play44:41

program certificate yes that is correct

play44:44

what programming language will be used

play44:46

python how much does the program cost

play44:48

the program costs $450 for Lifetime

play44:52

access so this is correct that's the GPT

play44:55

model answering questions from my PDF

play44:57

let's now change the model to something

play45:00

different I'm going to go up here and

play45:02

I'm going to do Lama 2 going to execute

play45:06

and everything here should stay the same

play45:09

if we did our

play45:11

jobs everything should work without

play45:15

making any other changes to what we

play45:18

built let's do all of that by the way

play45:22

loading the documents on Lama 2 uh is

play45:24

running here on my computer which is not

play45:26

I don't have like a big Nvidia GPU I

play45:28

have my my uh M3 laptop it's pretty good

play45:32

laptop but obviously with a bigger GPU

play45:35

it's going to run much faster so let's

play45:38

[Music]

play45:39

see

play45:42

okay answer look at this this Lama 2 is

play45:45

just so BOS the answer to your question

play45:48

is 18 hours of Hands-On live training

play45:50

spread over three weeks but it works

play45:52

it's correct how many coding assignments

play45:55

I don't know the exact number of coding

play45:57

assignments in the program Accord

play45:59

according to the provided document there

play46:00

are 30 coding assignments Jesus Christ

play46:03

it knows where the information is it

play46:05

just sucks at summarizing it is there a

play46:07

program certificate yes what programming

play46:10

language uh python how much does the

play46:13

program cost based on the context

play46:16

provided the program costs a th000 to

play46:18

join that is just not true it is very

play46:21

clear I don't think there is even a

play46:23

$11,000 mention in the entire higher

play46:26

page so that it just hallucinated that

play46:28

completely so obviously to get this

play46:30

working by the way I I also have this

play46:33

model let me just try this model just

play46:34

for fun and then I'm going to show you

play46:35

something else before we finish uh I'm

play46:38

going to try

play46:39

mixol

play46:42

uh just running the whole thing

play46:45

here uh one thing that I wanted to

play46:48

mention is that I've had obviously this

play46:52

these models are not as good as the GPT

play46:54

models and here I'm not even using GP G

play46:55

pt4 gp4 is so much better uh but if you

play46:59

if you play with the with the prompts uh

play47:02

you can get this models uh to do a very

play47:05

good job summarizing stuff obviously my

play47:07

my prompt here is very very Bare Bones

play47:10

just for this example but this I mean

play47:13

don't don't be discouraged because the

play47:15

model doesn't answer correctly some of

play47:17

these questions uh playing with a prompt

play47:19

will go a long way uh yeah so let's see

play47:23

it's still running this is a big model

play47:25

uh it takes a little bit of time to

play47:27

generate all of those embeddings because

play47:29

the mol a7b is is huge compared to Lama

play47:34

two let's see uh did we start a

play47:38

answering yet not

play47:42

yet still invoking this

play47:51

chain it's coming it's coming through 20

play47:54

seconds just to invoke this question

play47:57

here there we

play48:00

go okay so now it's G to try to answer

play48:03

all of those questions again this is

play48:05

just a big model if we go back

play48:07

to uh let me see mraw is this one here

play48:11

it's the 26 gigabyte compare that to the

play48:15

3 gigabyte this is this one here is L 2

play48:19

so the 26 gigabyte is is mixol so it

play48:22

takes quite a bit of time on my computer

play48:25

to produce any results and speaking of

play48:28

and while that works speaking of of of

play48:31

being slow and whatnot uh I want to show

play48:33

you something else which is it's pretty

play48:37

cool and the first thing is how to

play48:40

stream so you can use here I'm invoking

play48:44

my chain and waiting for an answer to

play48:47

come back to display the whole answer

play48:49

but a trick just to to for your users

play48:53

when you present this information

play48:56

could be just streaming answers back so

play49:00

you can see here I'm calling the chain

play49:02

do stream instead of calling invoke I'm

play49:05

calling stream with a question and

play49:09

whenever this finishes uh it has answer

play49:14

one uh this hallucinated this four hours

play49:17

of live sessions per week oh no no no

play49:19

this is true with two live sessions each

play49:21

lasting two hours these live sessions

play49:23

take place every Monday and Thursday so

play49:24

that is true

play49:26

but it's not

play49:28

returning oh there we go look at this so

play49:31

mixol is is doing a multiplication here

play49:35

so it's saying it's 12 hours and it's

play49:37

ignoring ah there we go however the

play49:39

document mentions that there are 18

play49:42

hours of Hands-On live training okay

play49:45

it's just very very bothos try to do

play49:47

some math uh

play49:50

yeah it says that the number is not

play49:53

specifying the documents

play49:56

but it can be infer that there are at

play49:58

least 30 codent

play50:01

assignments what what do you mean like

play50:04

you first tell me that you cannot infer

play50:06

it I mean that that is not mentioned and

play50:08

then that you can infer that no you just

play50:10

read that it says D coding assignments

play50:12

in my document is models are just

play50:16

yeah okay uh what model will be used is

play50:22

Python and it says that the document

play50:24

does not provide information on the on

play50:26

the on the cost of the program by the

play50:28

way the information is there you saw GPT

play50:30

doing it let's go back to GPT here

play50:32

really quick so we can test

play50:35

the the streaming I'm going to show you

play50:38

streaming uh something else really quick

play50:40

and then we are just going to be done

play50:43

with

play50:44

this okay

play50:46

so this is how it works when it answers

play50:50

one question after the other right and

play50:52

you can see boom boom boom boom it

play50:54

displays the answers all together but if

play50:56

we do streaming look at what's going to

play50:58

happen boom boom boom see let's let me

play51:00

try to do it

play51:01

again see how it just sort of like

play51:04

Builds on the question and it's really

play51:06

fast that's why you barely notice but it

play51:09

builds on the answer uh just because

play51:11

it's streaming out the characters as

play51:14

they are produced by the model so that's

play51:16

super cool the other thing that you can

play51:18

do is just batching uh which is also

play51:21

super cool so here I have a bunch of

play51:24

questions and I'm answering those

play51:26

questions one by one we can also just do

play51:28

batching so batching basically I'm just

play51:31

passing instead of passing just a single

play51:33

question I'm passing an array of

play51:34

questions and when I do that look what's

play51:36

going to happen it's just going to take

play51:38

a little bit more time but boom it's

play51:40

just going to display all of the answers

play51:42

at the same time and the good news is

play51:44

that all of these calls are going to be

play51:46

in parallel behind the scenes so we

play51:48

don't have to wait for one answer in

play51:50

order to ask the next question we can

play51:52

answer we can ask many questions at the

play51:55

same time time so the overall result is

play51:58

going to be way faster so all of that is

play52:00

thanks to L chain so again the code is

play52:03

going to be down uh in the description

play52:06

below just make sure you like this video

play52:09

it's a ton of work that goes into this

play52:11

videos make sure you like these videos

play52:13

I'm going to be creating more videos but

play52:15

it's your likes what makes me create

play52:17

more videos If you guys don't like it

play52:19

well just going to stop creating like

play52:21

these videos uh what you learned today

play52:24

just as the final thing thing that I

play52:25

need to mention is how to use these

play52:27

models locally right and you can do this

play52:30

on your Linux server on your own

play52:31

computer or whatever use these models

play52:34

locally and combine them or create a

play52:37

piece of code that will allow you to use

play52:39

these models uh regardless of the that

play52:41

the exact model you can use one or the

play52:44

other and the entirety of your code does

play52:47

not need to change to reflect that so

play52:50

hopefully you enjoyed it um I have a

play52:52

bunch of videos that are going to be

play52:53

coming through I think the next one is

play52:55

going to be a simpler one how instead of

play52:58

just doing a PDF how you can connect to

play53:01

the web directly and answer questions

play53:03

from a website I think that's the one

play53:05

that's going to come next we'll see uh

play53:07

but anyway thank you and I will see you

play53:10

in the next one

play53:13

byebye

Rate This

5.0 / 5 (0 votes)

Related Tags
LLM localCódigo abiertoSistema RAGPrivacidadAhorroModelos GPTRobóticaEducación en IAInnovación tecnológicaDesarrollo de software
Do you need a summary in English?