Movie Recommender System in Python with LLMs

NeuralNine
9 Jun 202425:00

Summary

TLDRThis video tutorial guides viewers through building a movie recommender system in Python using a local large language model and a vector store. It utilizes a Netflix dataset from Kaggle, transforming movie and TV show information into textual representations. These are then embedded into a 4,096-dimensional vector space by llama 2, allowing similar content to cluster together. The system stores vectors in a Facebook AI Similarity Search (FAISS) database for efficient similarity searches, providing movie recommendations based on user preferences or imagined scenarios.

Takeaways

  • 🎬 The video is about building a movie recommender system using a large language model and a vector store in Python.
  • 📈 The process involves using a dataset from Kaggle, specifically a Netflix dataset containing information about TV shows and movies.
  • 🔍 Features of the dataset include the title, description, cast, release year, genre, and other details of movies and TV shows.
  • 📝 The script describes creating a textual representation of each movie or TV show by combining its features into a single string.
  • 🧠 A large language model, in this case, llama 2, is used to embed these textual representations into a high-dimensional vector space.
  • 📊 The expectation is that similar movies will be closer in vector space, relying on the intelligence of the language model to determine similarity.
  • 🛠 The embeddings are stored in a vector store called FAISS (Facebook AI Similarity Search), provided by Facebook.
  • 🔄 The vector store can then be used to find the top recommendations for a given movie by comparing the movie's vector to others in the store.
  • 🔧 The video includes a step-by-step guide on installing the necessary Python packages and the llama 2 model, as well as coding the recommender system.
  • 🔍 The script also covers how to perform a similarity search using the vector store to recommend movies similar to a given movie.
  • 💡 The video concludes by demonstrating the recommender system with examples, including creating a hypothetical movie and getting recommendations based on it.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is building a movie recommender system using a large language model and a vector store in Python.

  • Which dataset is used in the video for building the recommender system?

    -The video uses a dataset from Kaggle, specifically the Netflix dataset containing information about TV shows and movies.

  • What features does the Netflix dataset contain?

    -The Netflix dataset contains features such as the title, description, cast, release year, genre, and other details of movies and TV shows.

  • How does the large language model contribute to the recommender system?

    -The large language model contributes by taking the textual representation of a movie and embedding it into a high-dimensional vector space, allowing for the identification of similar movies based on their vector proximity.

  • What is the name of the vector store used in the video?

    -The vector store used in the video is called FAISS, which stands for Facebook AI Similarity Search.

  • What is the dimension of the vectors produced by the large language model in this video?

    -The large language model, in this case, llama 2, produces vectors with a dimension of 4096.

  • How does the recommender system determine which movies are similar?

    -The recommender system determines similarity by comparing the vector representations of movies in the vector space, where closer vectors indicate more similar movies.

  • What Python packages are needed to implement the recommender system as described in the video?

    -The Python packages needed are numpy, pandas, FAISS, and requests.

  • How does the video demonstrate the process of creating a textual representation of a movie?

    -The video demonstrates this by creating a function that formats a multi-line string containing various attributes of a movie from a data frame, such as type, title, director, cast, release year, genre, and description.

  • What is the purpose of the 'create_textual_representation' function in the script?

    -The 'create_textual_representation' function is used to generate a textual representation of each movie from the data frame, which includes details like type, title, director, cast, release year, genre, and description.

  • How can the recommender system be tested with a new or hypothetical movie?

    -The recommender system can be tested by creating a textual representation of a new or hypothetical movie, embedding it using the large language model, and then performing a similarity search to find the most similar existing movies in the vector store.

  • What is the significance of the embedding process in the context of the recommender system?

    -The embedding process is significant as it translates the textual information of movies into a numerical format that can be compared and analyzed within the vector space, which is essential for the similarity search and recommendation functionality.

  • How does the video handle the installation and usage of the large language model llama 2?

    -The video instructs viewers to install llama 2 by downloading it from the official website, using the command line to pull the model, and then utilizing it locally for generating embeddings from textual representations.

  • What is the final output of the recommender system when a user queries for movie recommendations?

    -The final output of the recommender system is a list of the top five most similar movies to the user's query, which could be based on an existing movie from the dataset or a hypothetical movie created by the user.

Outlines

00:00

🎬 Building a Movie Recommender System

The video script introduces a project to create a movie recommender system using a large language model and a vector store in Python. The process involves using a dataset from Kaggle, specifically a Netflix dataset with features like movie titles, descriptions, cast, release year, and genres. The aim is to convert this data into a textual representation, which will then be embedded into a high-dimensional vector space by the language model llama 2. The expectation is that similar movies will be positioned closer in this vector space, facilitating a recommendation system based on vector similarity.

05:00

📝 Preparing the Data and Environment

The script details the preliminary steps for the project, which include installing the necessary Python packages like numpy, pandas, faiss (a vector store by Facebook), and requests. It also involves downloading the Netflix dataset from Kaggle and preparing the data by creating a textual representation for each movie or TV show. This textual data will be used to generate vectors using the llama 2 model, which will then be stored in the vector database for later use in the recommendation process.

10:02

🔢 Embedding Text into Vector Space

The script explains the process of sending textual representations of movies to the llama 2 model to generate embeddings. It outlines the technical steps of using the requests library to send POST requests to the local llama 2 server, receiving embeddings in response, and storing these vectors in a numpy array. The embeddings are then added to the faiss index, which serves as the vector store. The process includes checking progress and handling the data to ensure it's in the correct format for the vector store.

15:03

🔍 Performing Similarity Searches for Recommendations

The script demonstrates how to use the vector store to perform similarity searches for movie recommendations. It involves loading the vector index from a file, crafting a textual representation of a movie (either from the dataset or a hypothetical one), and using the llama 2 model to embed this representation into the vector space. The index is then queried to find the top matches based on vector similarity, which are returned as indices. These indices are used to retrieve the corresponding movies from the dataset to provide recommendations.

20:03

🎉 Conclusion and Next Steps

The final paragraph wraps up the video by summarizing the process of building the movie recommender system and encouraging viewers to experiment with different textual representations to improve the recommendation quality. The script also invites viewers to like, comment, and subscribe for more content, and hints at the possibility of using different models or parameters for the llama 2 model to enhance the system's performance.

Mindmap

Keywords

💡Movie Recommender System

A movie recommender system is an algorithmic tool designed to suggest films to users based on their preferences or viewing history. In the video, the creator discusses building such a system using a large language model and a vector store in Python. The system works by analyzing textual data about movies and recommending similar ones, making 'Movie Recommender System' a central concept in the video.

💡Large Language Model

A large language model refers to a complex artificial intelligence system capable of understanding and generating human-like text. In the context of the video, the model is used to process movie descriptions and other textual information, converting them into vector representations. The script mentions using 'llama 2', a specific large language model, to embed movie data into a high-dimensional vector space.

💡Vector Space

Vector space, in the context of this video, is a mathematical concept used to represent data points in a multi-dimensional space where each dimension corresponds to a feature. The video describes how the large language model embeds movie descriptions into a 4,096-dimensional vector space, allowing for the calculation of similarities between movies based on their vector representations.

💡Netflix Dataset

The Netflix Dataset mentioned in the script is a collection of data that includes information about TV shows and movies, such as titles, descriptions, cast, release year, and genres. This dataset is used to train the movie recommender system, providing the raw data from which the system learns to make recommendations.

💡Textual Representation

In the video, textual representation refers to the process of converting the features of a movie, such as its title, description, and genre, into a single string of text. This string is then used by the large language model to create a vector representation of the movie, which is essential for the recommender system to function.

💡Embedding

Embedding, in the context of the video, is the process of transforming textual data into numerical vectors using a large language model. The script describes how each movie's textual representation is embedded into a 4,096-dimensional vector by 'llama 2', which positions the movie within the vector space based on its content.

💡FAISS (Facebook AI Similarity Search)

FAISS is a library developed by Facebook AI for efficient similarity search, which the video script mentions as the vector store used to store and manage the movie embeddings. The script explains that FAISS will contain all the vectors, allowing the system to perform searches and find the top similar movies to a given query.

💡Similarity Search

Similarity search is the process of finding data points that are similar to a given query point in the vector space. The video describes how, after embedding movie data into vectors, a similarity search is performed to find the top five most similar movies to a user's input, which is a key function of the recommender system.

💡Kaggle

Kaggle is an online platform for data science and machine learning competitions. In the script, Kaggle is mentioned as the source of the Netflix Dataset used to build the movie recommender system. The dataset is downloaded from Kaggle and then processed for the system.

💡Python Packages

The script lists several Python packages that are essential for building the movie recommender system, including numpy for numerical operations, pandas for data manipulation, FAISS for the vector store, and requests for sending HTTP requests to the large language model. These packages provide the necessary tools for data handling, processing, and system integration.

Highlights

Building a movie recommender system using a large language model and a vector store in Python.

Utilizing a local large language model to power the recommendation process.

Using a dataset from Kaggle containing Netflix information about TV shows and movies.

Features in the dataset include title, description, cast, release year, and genre.

Creating a textual representation of each movie or TV show from the dataset.

Embedding the textual representation into a high-dimensional vector space using LLaMA 2.

The expectation that similar movies will be closer in vector space.

Storing the movie vectors in a vector store called FAISS by Facebook.

Performing a similarity search to recommend movies based on a given movie's vector.

Installing LLaMA locally and using it to embed movie descriptions without the need for additional coding.

Downloading the Netflix dataset and installing necessary Python packages for the project.

Crafting a function to create a textual representation for each movie from the dataset.

Using the embeddings to populate a vector store for similarity searches.

Demonstrating the process of embedding a movie and searching for similar movies.

The ability to recommend movies based on a newly created imaginary movie's description.

The practical application of the recommender system in providing movie suggestions.

The potential for customization and experimentation with different textual representations.

The conclusion summarizing the process and the capabilities of the built movie recommender system.

Transcripts

play00:00

what is going on guys welcome back in

play00:01

this video today we're going to build a

play00:03

movie recommender system using a large

play00:05

language model and a vector store in

play00:07

Python so let us get right into it not

play00:09

[Music]

play00:17

AED all right so we're going to build a

play00:19

movie recommender system in Python today

play00:21

which is going to be powered by a large

play00:23

language model which runs locally on our

play00:25

machine now I'm going to give you a

play00:27

brief sketch of the process so that you

play00:29

understand how the reom Commendation

play00:30

process works and you don't have to just

play00:32

copy paste the code without

play00:34

understanding what it does so it's not

play00:35

going to be the most beautiful sketch

play00:36

here but I'm going to try to explain it

play00:38

as simply as possible what we're going

play00:40

to do is we're going to work with a data

play00:42

set from kaggle and this is going to be

play00:44

a Netflix data set containing

play00:47

information about TV shows and movies

play00:50

and it has features like title of uh the

play00:53

movie or show and description and cast

play00:56

and release year and genre and so on so

play00:59

each row represents a movie or a TV show

play01:01

and has features like the type is it a

play01:04

movie or TV show uh stuff like the title

play01:07

and the description and the release here

play01:11

and the director and the cast and so on

play01:13

so we have a couple of features here per

play01:15

row and what we want to do now is we

play01:18

want to take these and turn them into a

play01:20

textural representation now this is

play01:22

quite simple we don't have to do

play01:24

anything else but just create a string

play01:26

that says something like type is

play01:29

whatever for example movie then title is

play01:33

whatever the title of the movie is and

play01:35

so on and all of this is going to be one

play01:38

uh one large string containing all the

play01:40

information about that movie now what

play01:42

our large language model will do with

play01:44

that string is it will take this string

play01:47

and it will embed it into Vector space

play01:49

to keep it simple this basically means

play01:51

that the large language model does

play01:53

something with this content with this

play01:55

string and turns it into a vector into a

play01:58

high dimensional vector to be precise

play02:00

we're going to use llama 2 in this

play02:03

video uh locally here and llama 2 will

play02:06

produce a vector for each row with the

play02:09

size

play02:11

4,096 so we're going to have 4,096

play02:14

different values here numbers that will

play02:17

position this Vector in the vector space

play02:19

in the 4,096 dimensional Vector space

play02:22

this is what we're going to do here so

play02:25

every movie will end up being some point

play02:27

in this Vector space some Vector in this

play02:29

vector space now the idea is that

play02:33

similar movies or movies that are

play02:35

somehow uh yeah similar to one another

play02:37

will end up closer in the vector space

play02:40

than movies that are very different now

play02:42

how does it happen that's the

play02:44

intelligent of that's the intelligence

play02:45

of Lama 2 so we don't have to do any

play02:48

fancy coding here we don't have to

play02:50

implement some intelligent algorithm we

play02:51

don't have to train a machine learning

play02:53

model we expect llama 2 to have the

play02:56

intelligence this is already inside of

play02:59

the model the intelligence needed to do

play03:01

that to take movies that are somewhat

play03:03

similar and to put them together closer

play03:06

together in the vector space than movies

play03:08

that are completely different that's the

play03:10

basic idea and then what we do is we

play03:12

store all of these vectors into a vector

play03:15

store so a vector database we're going

play03:17

to use face which is a uh Facebook I

play03:21

don't know what the acronym is but it's

play03:22

a vector store by Facebook and this

play03:25

Vector store will contain all the

play03:27

different vectors and what we can do

play03:29

then with this Vector store is we get

play03:30

some new movie with the string title uh

play03:34

and type and director and so on as a

play03:36

string we feed this into the index into

play03:40

our Vector store and as a result we get

play03:43

for example the top five most similar

play03:46

movies that could be interesting to you

play03:48

if you like this one so you get a list

play03:50

of five movies that could be interesting

play03:52

to you that is what we're going to build

play03:55

in this video today using llama 2

play03:57

running on our system locally if you can

play03:59

do that otherwise you will probably have

play04:01

to um to host it somewhere or you have

play04:04

to use the API of something else you can

play04:06

also use chat GPT but you have to pay

play04:08

for it or actually just GPT not chat GPT

play04:11

but if you have uh the resources you can

play04:13

just run this locally using olama so

play04:16

this is what we're going to build in

play04:17

this video today now how are we going to

play04:19

do that we're going to start by first of

play04:22

all installing olama now olama is quite

play04:25

easy to install you just go to the

play04:26

website olama com you download it for

play04:29

your op operating system on Linux you

play04:30

just run this command and then you can

play04:32

use it on your system if you have

play04:34

troubles with that I have a video on

play04:36

this channel about olama where you can

play04:38

look uh where you can see how I install

play04:40

it and and how it works basically but

play04:42

it's really not complicated what we're

play04:44

going to do then is once you have ol

play04:46

Lama installed you're going to type into

play04:48

your terminal ol Lama pull and then the

play04:52

model that you want to use I as I said

play04:53

I'm going to use llama 2 you can also

play04:56

provide another uh model name I think on

play05:00

uh their website you should be able to

play05:02

see the models that they offer you have

play05:04

llama 3 you have mistol you have llama 2

play05:07

with the different parameters also as

play05:09

well if you want to have a a larger

play05:11

model but you can do that uh and install

play05:15

the model that you want to use all right

play05:18

so once you have that done we're going

play05:20

to also download a data set from kaggle

play05:22

the Netflix movies and TV shows data set

play05:24

you will find a link in the description

play05:26

down below just download the Netflix

play05:28

title CSV v file and then we can get

play05:30

started with the coding now actually

play05:33

before we get started with the coding we

play05:34

also need to install the python packages

play05:36

that we're going to use in this video

play05:38

and for this we're going to open up a

play05:39

command line and install numpy pandas

play05:44

face and requests these are the four

play05:48

packages that we're going to need in

play05:49

this video today numpy and pandas

play05:51

obviously because we're going to work

play05:52

with data face because that is our

play05:54

Vector store and requests because we

play05:56

need to send requests to olama uh now

play05:59

now actually we cannot say face I think

play06:01

we need to say face CPU or face GPU so

play06:04

depending on whether you have a GPU that

play06:07

you can use here run face GPU or face

play06:11

CPU if you want to use your processor

play06:12

for this just install the packages and

play06:15

once you have them we can get

play06:16

started all right so the first thing is

play06:19

we're going to obviously import pandas

play06:21

aspd and we're going to then say the

play06:23

data frame is going to be PD read

play06:27

CSV Netflix title

play06:30

CSV and then we can look at it and you

play06:33

can see we have let me just close this

play06:34

here uh we have a couple of features the

play06:37

IDE of the show we have the type of the

play06:39

show movie or uh TV show we have the

play06:42

title we have the director the cast the

play06:45

country that it was produced in I guess

play06:48

uh the date it was added to Netflix the

play06:50

release year also some rating uh the

play06:53

duration and also the genres here so

play06:56

what kind of movie is it and the most

play06:58

important thing I assume is going to be

play07:00

the description because it tells us

play07:02

briefly what the movie is about and I

play07:04

think that's the most important thing uh

play07:07

maybe together with the documentaries

play07:08

and with a cast that is going to be

play07:10

relevant for uh a similarity search all

play07:14

right so we're going to keep it simple

play07:16

here all we're going to do is we're

play07:17

going to craft for each line a string

play07:21

that represents this or the individual

play07:24

movies uh textually so we're going to

play07:27

create a function down here we're going

play07:29

to call it def uh we're going to call it

play07:31

create textual representation given a

play07:37

row from the data frame and what we're

play07:40

going to do is we're going to say

play07:43

textual representation is going to be

play07:47

equal to a multi-line

play07:49

string like this um and the important

play07:53

thing is to not have any tabs here and

play07:55

we're going to say something like type

play07:57

and type is going to be just

play08:00

so uh actually we need to make this a

play08:03

formatted multi-line string like

play08:06

this the type is going to be the type

play08:08

then we're going to say that the title

play08:11

is going to

play08:13

be the title and actually we can then

play08:18

just go and copy paste this I think and

play08:20

we can say we want to have the director

play08:23

we want to have the cast and we want to

play08:26

have uh the release year and genre

play08:31

and the most important thing also in the

play08:34

end the

play08:37

description so this is going to be

play08:39

description this is going to

play08:42

be actually we need to turn this into

play08:46

row

play08:48

title and into row

play08:53

director and into

play08:56

row cast

play09:00

and into

play09:01

row release I think release year

play09:06

right and

play09:08

into row and I think it

play09:12

was what was it called listed in is what

play09:16

we're looking for listed in and then

play09:19

finally of course row

play09:22

description so that should work and this

play09:25

function now applied to the rows and

play09:28

we're going to do it like this here this

play09:31

function applied of course we need to

play09:33

return the textual

play09:34

[Music]

play09:38

representation this function now applied

play09:40

to the individual roles will give us

play09:42

textual

play09:43

representation uh representations for

play09:45

the individual R so I can say DF

play09:48

apply and then I can just apply the

play09:51

create textual representation function

play09:54

axis equals 1 and I will get for every

play09:56

single um row here I will get

play10:00

the string so I can say actually that

play10:02

this is my textual

play10:07

representation

play10:11

column then you can see I have it and

play10:13

then I can just get the

play10:14

[Music]

play10:16

column and show the different values so

play10:19

if I print the first one we're going to

play10:22

see that this is what what this looks

play10:23

like in the end so this is just our

play10:26

basic representation now which we're

play10:27

going to use to ask llama 2 to turn this

play10:30

into a vector into Vector space or in

play10:33

the vector space so what we're going to

play10:36

do next is we're going to say

play10:38

import Face our Vector store import

play10:42

requests so that we can send requests to

play10:44

llama 2 or to O

play10:46

Lama and import numpy SNP we're going to

play10:51

need this here to create uh to create

play10:54

the array for our uh Vector store we're

play10:56

going to define the dimension of our

play10:58

output as I set to be 496 because that

play11:01

is what is going to be returned by llama

play11:03

2 uh for the embedding

play11:05

Dimensions we're going to say index is

play11:08

equal to face

play11:11

index flat

play11:13

L2 with a dimension here as a parameter

play11:16

this is basically our database you could

play11:18

say our Vector store that we're creating

play11:20

and then we're going to initialize an X

play11:22

full of

play11:24

zeros and the dimensions here are going

play11:26

to be the length of

play11:29

the textural representations so how many

play11:32

instances do we have how many movies or

play11:34

shows do we have and how large is the

play11:37

dimension for each of them so we're

play11:38

going to have n vectors here this amount

play11:41

of vectors uh off size

play11:45

dimension and the data type is going to

play11:48

be float

play11:52

32 all right so that is now an array

play11:56

full of zeros as you can see and this is

play11:57

going to be filled up with the

play11:59

embeddings from llama 2 so we're going

play12:02

to say now 4 I uh actually 4 I Row in or

play12:07

actually for I

play12:10

representation in

play12:13

enumerate data frame textual

play12:16

[Music]

play12:17

representation what we want to do is

play12:19

want to say if I is

play12:23

200s or not 200 if I is divisible by 200

play12:27

so every 200 uh row I want to print the

play12:30

progress so I know how long it takes I'm

play12:32

not going to run all of this on camera

play12:33

because it's going to take quite some

play12:35

time but you can run this here to see

play12:36

the progress just

play12:38

say uh something like

play12:41

processed and then I and

play12:44

then processed I Str I uh instances for

play12:51

example and uh then we're going to say

play12:54

that I want to send a request which is

play12:56

going to return a response so rest is

play12:58

equal to request I want to send a post

play13:00

request to ol Lama and ol Lama by

play13:03

default unless you change that is

play13:05

running at

play13:06

HTTP and then Local Host

play13:10

Port um

play13:12

11434 so 11434 is the port that ama is

play13:16

running at and we want to use the

play13:18

embeddings API so/ API SL

play13:22

embeddings and what we want to send to

play13:24

this API is a Json object and this Json

play13:28

object will will contain the model that

play13:30

we want to use so the model is going to

play13:33

be llama 2 and also the prompt that we

play13:37

want to use uh or the prompt that should

play13:39

be embedded so the prompt is going to

play13:41

be equal to the

play13:44

representation that we're currently

play13:47

at there you go and this needs to be

play13:51

closed like this actually this should be

play13:56

indented uh yeah let's let's just leave

play13:59

it like this and this is going to return

play14:02

a response and this response will have a

play14:04

field embedding so we want to say

play14:06

embedding is equal to the response get

play14:08

me the Json object from the response and

play14:11

what I'm interested in is the field

play14:13

embedding this

play14:15

embedding is the vector so all we have

play14:17

to do now is we have to go to the I

play14:20

position this is why we enumerate this

play14:23

uh we going want to go to the I position

play14:25

in our zero vector and we want to

play14:27

replace the zeros with this new

play14:29

embedding that we got from the large

play14:31

language model so NP

play14:35

array

play14:37

embedding and then finally in the end we

play14:40

can add all of x to our Vector store now

play14:44

I will turn this number down to let's

play14:47

say 30 so that you can see how fast this

play14:50

works we have remember uh 8, 87 rows to

play14:54

be processed and if I run this you can

play14:57

see that it isn't

play14:59

that fast uh actually now I need to

play15:03

check that this is equal to

play15:05

zero and then I can run this again

play15:07

process zero

play15:10

instances then it takes some time

play15:12

process 30 instances and so on I think

play15:14

my video is lagging I'm pretty sure my

play15:17

video was lagging while this was running

play15:19

because it was running on the GPU so I'm

play15:21

not going to run this here on camera

play15:23

what I have actually done is I have

play15:24

trained this already and I will show you

play15:26

how you can save this when you've

play15:28

trained it already so you run this you

play15:30

wait for it to finish and once it's

play15:32

finished what you do is you say face

play15:34

right

play15:36

index and then you just take the index

play15:39

so this object here that you created and

play15:42

you write it to a file for example let's

play15:44

call it index as well once you do that

play15:46

you're going to create a file index that

play15:48

contains the index so you only have to

play15:51

run this once then it's done and then

play15:52

you can use it because you now have a

play15:54

vector database full of embeddings for

play15:56

the different movies um

play15:59

and in order to load it then which is

play16:01

what I'm going to do now in order to

play16:02

load it again you just need to go and

play16:05

say um you just need to go and say index

play16:09

is equal to pH read index and then the

play16:13

file for for example index and this is

play16:15

what I'm going to do here now so now I

play16:17

just copy pasted the exported Vector

play16:19

store the index file from my prepared

play16:21

code and all I have to do now is I have

play16:23

to open up a new cell and say that the

play16:25

index is equal to phase re Del index and

play16:30

then I just have to provide index and

play16:33

then I can go ahead and use this index

play16:35

to do a similarity search so how exactly

play16:39

can I now recommend a movie based on

play16:41

another movie now first of all I can

play16:43

craft my own textural representation of

play16:45

whatever movie I like I don't have to

play16:47

use a movie that's already part of the

play16:49

data frame but let's start with a movie

play16:51

that's part of the data frame so what I

play16:52

can do for example is I can say one of

play16:54

my favorite movies is Shutter Island I

play16:56

want to see if that's part of the data

play16:58

frame so let's go and say DF where DF

play17:00

title string contains and let's see if

play17:03

we have Shutter Island in

play17:06

here and in this case we have the movie

play17:09

Shutter and we have the movie Shutter

play17:11

Island so let's copy this ID and let's

play17:14

say that what I want to use now here as

play17:16

a base uh for my recommendation is my

play17:19

favorite movie is actually DF iock and

play17:25

then this so we can see that my favor

play17:29

movie is shudder Island here and now I

play17:31

can just go ahead and I can get the or

play17:34

actually let's go and uh write the code

play17:36

directly what we need to do now is we

play17:38

need to send this movie to O Lama to be

play17:41

embedded now we're going to pretend that

play17:43

this movie is not already embedded so

play17:44

we're not going to use it directly we're

play17:46

going to actually embed it again because

play17:48

this is now our new movie that we passed

play17:50

to the uh to the recommender system here

play17:54

and we want it to be embedded into the

play17:56

vector space and then it should find the

play17:58

most similar movie so the closest

play18:00

embeddings to the embeddings uh or to

play18:03

the embedding of this particular movie

play18:05

so what we do is we say response equals

play18:08

requests. poost and we post to the same

play18:12

URL so Local

play18:15

Host uh Port

play18:18

11434 SL API

play18:21

embeddings and then we want to say that

play18:24

the

play18:25

Json is going to be equal to a

play18:27

dictionary and the content here is going

play18:29

to be model

play18:31

again llama 2 and the prompt in this

play18:36

case is going to

play18:38

[Music]

play18:39

be our favorite

play18:42

movie and from it the textual

play18:47

representation like

play18:48

this that is going to be our uh

play18:52

requested we sent here and this is going

play18:54

to result in a embedding so in an

play18:57

embedding so we're going to get the

play18:59

embedding from this response again get

play19:01

the Json object get the

play19:03

embedding and uh actually we need to do

play19:07

some additional processing so we're

play19:09

going to say here uh we're going to turn

play19:12

this into a numpy array that's just some

play19:15

formatting stuff here so we're going to

play19:16

say it's a numpy rate off a list of the

play19:19

embedding with a data

play19:22

type float

play19:24

32 and this is now what we're going to

play19:27

use as a basis for the similarity search

play19:29

so we're going to search our index for

play19:33

similar embeddings to this one so we're

play19:36

going to say d and I is equal to index

play19:40

search given the embedding give me the

play19:43

top five matches so I can say five here

play19:46

and this will result in the top five

play19:48

matches now what I'm going to get here

play19:51

as a response is of course not the

play19:54

actual title not the actual data not the

play19:56

textual representation I will get the

play19:59

indices I will get the actual um we can

play20:03

look at it um I will get the numbers I

play20:06

will not get any

play20:08

specific uh movies so what I have to do

play20:11

is I have to take the data frame and get

play20:14

these particular movies so what I can do

play20:16

is I can say the best matches are going

play20:19

to be equal to NP

play20:21

array and then data frame textual

play20:25

[Music]

play20:27

representation um and from this we're

play20:29

going to go and get I flatten get all

play20:33

these indices flatten them and use them

play20:36

as an index here and then I can just say

play20:38

for match in best

play20:42

matches uh I want to print next

play20:48

movie then I want to print the match so

play20:51

the movie representation and then just

play20:53

an empty line and this is going to give

play20:55

me the top five recommendations now

play20:57

obviously the most similar

play20:59

um match here to my prompt to my

play21:02

embedding is the embedding itself

play21:03

because Shutter Island was already part

play21:05

of the vector store so of course the

play21:07

closest match is shter Island itself the

play21:10

second closest one is this one Devil's

play21:13

gate uh seeking a missing woman in North

play21:16

Dakota an FBI agent and a sheriff focus

play21:18

on her religious zot husband but

play21:21

discovers something far more Sinister

play21:23

sounds like something mysterious and

play21:25

something slightly scary which is also a

play21:27

little bit like shutter Island I've not

play21:29

watched this movie but I think that

play21:31

could be a good match at least from the

play21:33

description it sounds like that um and

play21:35

then you can also see the other matches

play21:37

here now the interesting thing is I

play21:39

don't have to provide actually an

play21:42

existing uh movie I can also make up a

play21:44

movie I can maybe say I can imagine a

play21:47

movie so let's go and copy

play21:50

this and let's turn this into a

play21:53

string and let's just make up something

play21:55

so let's say my title my movie title is

play21:59

the

play22:00

mysterious python the director

play22:03

is who did chter Island let's use the

play22:07

same

play22:09

director and then let's say the cast is

play22:13

Leonardo

play22:16

decapo and then maybe I don't know

play22:21

uh I don't know let let's go with

play22:23

someone else like uh

play22:26

Sylvester is it written like this alone

play22:28

I hope so I hope this is not too

play22:30

embarrassing I'm not a movie guy so I

play22:32

don't know uh let's just say these are

play22:34

the two people and then it's released

play22:36

2020 and it's

play22:39

mystery drama thriller or something like

play22:42

this and then we can say that the

play22:43

description

play22:45

is uh a group of

play22:51

adventurers

play22:52

discover a mysterious programming snake

play22:57

in the jungle

play23:03

and uh I don't

play23:07

know and find something extremely

play23:14

shocking yeah let's say this is my movie

play23:16

description now I can take this and I

play23:18

can actually save this now as

play23:24

representation and then I can use

play23:26

that as an input here

play23:30

representation then I get the embeddings

play23:32

and the best matches are the following a

play23:35

patch of ful this is when a guard

play23:39

catches a rider television host

play23:41

shoplifting instead of turning him in he

play23:43

only asks to be a friend then begins to

play23:44

rule his life okay I don't know why that

play23:47

is super similar to what I

play23:49

said uh but yeah I mean you can play

play23:53

around with that and see how good the

play23:55

recommendations are maybe it's also

play23:56

reasonable to only include the

play23:58

description destion or to only include

play24:01

genres in description something like

play24:02

this maybe the other things are too um

play24:07

too confusing but that's the basic idea

play24:09

this is how you can build a recommender

play24:11

system based on any data set you just

play24:13

pick a textual representation you uh or

play24:17

you create a textual representation you

play24:18

embed these representations and then you

play24:20

just perform a similarity search and you

play24:22

hope that the embeddings are somewhat

play24:24

intelligent based on the large language

play24:26

model that you use to create them so

play24:28

that's it for today's video I hope you

play24:30

enjoyed it and I hope you learned

play24:31

something if so let me know by hitting a

play24:33

like button and leaving a comment in the

play24:34

comment section down below and of course

play24:36

don't forget to subscribe to this

play24:37

Channel and hit the notification Bell to

play24:38

not miss a single future video for free

play24:40

other than that thank you much for

play24:42

watching see you in the next video and

play24:43

bye

Rate This

5.0 / 5 (0 votes)

Related Tags
Movie RecommenderPython CodingLarge Language ModelVector StoreData EmbeddingNetflix DatasetMachine LearningAI RecommendationLocal HostingTextual Analysis