Offline AI Chatbot with your own documents - Anything LLM like Chat with RTX - | Unscripted Coding

Unscripted Coding
4 Apr 202421:26

Summary

TLDRIn this episode of 'Unscripted Coding,' the host explores 'Anything LLM,' an open-source alternative to Nvidia's 'Chat with RTX,' focusing on local AI model operations to protect sensitive data. The host installs and tests 'Anything LLM' on Windows, discussing its potential for local embeddings and vectors, and the flexibility to mix local and online models. Despite a polished interface, the host encounters issues with file embeddings and vectors, suggesting that while the concept is promising, the execution needs improvement. The video concludes with the recommendation to revisit the tool in the future, as it shows potential but currently falls short of expectations.

Takeaways

  • 🎙️ The video explores 'anything llm,' an open-source alternative to Nvidia's Chat with RTX.
  • 💻 The speaker discusses the importance of using local AI models for sensitive information to avoid data privacy issues associated with online chatbots.
  • 🔧 Nvidia's Chat with RTX allows running AI models locally using Nvidia graphics cards but requires a modern, powerful computer.
  • 🛠️ The speaker's experience with Chat with RTX was mediocre, prompting a search for alternatives like 'anything llm.'
  • 📥 'anything llm' can utilize embeddings and vectors locally, allowing the use of local files and models while optionally connecting to online services.
  • 🌐 The gold standard for language models is OpenAI's GPT-4, but local models like ol' LLaMA can be used for different needs.
  • 🔍 The video demonstrates the installation and initial setup of 'anything llm,' including connecting local files for processing.
  • ⚙️ The process involves embedding files to make them searchable, but the speaker encountered issues with the accuracy of file retrieval.
  • 🤖 The video shows a test of 'anything llm,' which struggled with accurately citing the correct files from the embeddings.
  • 📊 Despite a polished interface, the speaker finds 'anything llm' lacking in performance when running entirely locally, suggesting it might improve over time.

Q & A

  • What is the main topic of the 'Unscripted Coding' episode discussed in the transcript?

    -The main topic is exploring 'Anything LLM', an open-source alternative to Nvidia's 'Chat with RTX', focusing on the use of large language models (LLMs) locally on a computer for privacy and data security.

  • Why is it risky to use online chatbots for sensitive information like employment contracts?

    -Using online chatbots for sensitive information is risky because there's a possibility that these platforms may train on your data, mine your data, or sell your data without your consent, compromising privacy and security.

  • What is Nvidia's 'Chat with RTX' and how does it relate to the topic?

    -'Chat with RTX' is an idea by Nvidia that allows users to run AI models locally on their own computer using an Nvidia graphics card, ensuring that data processing is done on the user's own hardware, thus addressing privacy concerns.

  • What is the primary advantage of running AI models locally as opposed to using cloud services?

    -The primary advantage is that running AI models locally keeps all data and processing within the user's own computer, reducing the risk of data breaches, unauthorized data access, and ensuring complete control over the data.

  • What does 'Anything LLM' offer that differentiates it from other AI chatbots?

    -'Anything LLM' offers the ability to use embeddings and vectors locally on the user's computer, allowing for local processing of files and interaction with AI models without the need for online services.

  • What is the significance of being able to mix and match different models and embedding services in 'Anything LLM'?

    -The ability to mix and match allows users to choose the best combination of models and embedding services that meet their specific needs, providing flexibility and potentially better performance or security.

  • What is the 'gold standard' for LLMs as mentioned in the transcript?

    -The 'gold standard' for LLMs, as mentioned, is Open AI's GPT (Generative Pre-trained Transformer), which is recognized for its advanced capabilities in language understanding and generation.

  • What was the speaker's experience with 'Chat with RTX' and 'Anything LLM'?

    -The speaker had a mediocre experience with 'Chat with RTX' but found 'Anything LLM' to be less satisfactory, particularly with the local embedding and vector database not functioning as expected.

  • What issue did the speaker encounter while trying to connect files to 'Anything LLM' for processing?

    -The speaker encountered issues with the file embedding process, where the system was not correctly identifying and serving up the correct files, leading to inaccurate responses from the AI.

  • What was the speaker's suggestion for improving the experience with 'Anything LLM'?

    -The speaker suggested revisiting the tool after a few months, as it is a new idea and may benefit from further development and updates to address the current issues.

  • What is the speaker's final verdict on using 'Anything LLM' for local AI processing?

    -The speaker concludes that while the idea of 'Anything LLM' is promising and the interface is polished, it is not yet ready for reliable local AI processing due to the issues encountered with file embeddings and model performance.

Outlines

00:00

🤖 Exploring Alternatives to Chat with RTX

The video introduces a new episode focused on examining 'anything llm', an open-source alternative to Nvidia's 'Chat with RTX', which allows local AI model execution on a personal computer. The host discusses the privacy concerns of using large language models (LLMs) for sensitive information, like employment contracts, and the benefits of running these models locally to protect data. Nvidia's 'Chat with RTX' is highlighted as a solution using the user's graphics card for local AI processing. The host shares their mixed experience with 'Chat with RTX' and their curiosity about 'anything llm', which they proceed to install and explore.

05:01

🔍 Setting Up and Testing 'Anything LLM'

The host describes the process of setting up 'anything llm' on Windows, emphasizing the ease of installation but noting the need for a modern, high-performance computer. They explore the software's interface, discussing the options for using local or online models and the flexibility of combining storage and processing. The host attempts to use the software for a regular chat and tests its ability to handle file embeddings, facing some issues with file acceptance and processing.

10:02

📚 Embedding Documents and Searching for Specific Files

The host attempts to embed various authorization forms into 'anything llm' to make them searchable within the system. They encounter difficulties with certain file types and the process of embedding, which leads to a temporary halt in the demonstration to review the documentation. After resolving the issue, they successfully embed a 'Hello World' text file and demonstrate the system's ability to search for and cite files, although with some initial inaccuracies in file recognition.

15:02

🔄 Troubles with File Citation and Model Performance

The host experiences issues with the system's file citation accuracy, noting that it incorrectly identifies files when asked about specific documents. They switch between different models, including F and GPT, to test if the model's performance affects the citation accuracy. Despite these changes, the core issue of incorrect file retrieval persists, suggesting a problem with the underlying embedding and vector database rather than the model itself.

20:04

🔄 Reflecting on the Experience and Future Prospects

The host concludes the video by reflecting on their experience with 'anything llm', acknowledging its polished interface but expressing disappointment with the local embedding and vector database's performance. They compare 'anything llm' to 'Chat with RTX', leaning slightly towards the latter for its recent updates. The host suggests that while 'anything llm' offers flexibility in mixing and matching models and embeddings, it requires further development to be a viable local solution. They encourage viewers to subscribe for future updates and express hope for revisiting the topic in the coming months.

Mindmap

Keywords

💡LLM (Large Language Models)

LLM stands for Large Language Models, which are AI systems capable of processing and generating human-like text based on the input they receive. In the video's context, LLMs like Chat GPT and Claude are discussed as tools for interactive AI chatbots. The script mentions concerns about privacy when using these models for sensitive information, such as employment contracts.

💡Local AI Processing

Local AI Processing refers to running AI models on one's own computer or device, rather than relying on cloud-based services. The video discusses the benefits of this approach for privacy and control over data. Nvidia's 'Chat with RTX' is highlighted as an example of a locally run AI model that utilizes the user's graphics card for processing.

💡Anything LLM

Anything LLM is a software mentioned in the script that allows users to run AI models locally on their computers. It is presented as an alternative to cloud-based AI services and is discussed in terms of its interface, capabilities, and the challenges faced during the demonstration in the video.

💡Embeddings and Vectors

Embeddings and vectors are mathematical representations used in AI to convert text or data into a format that can be understood and processed by machine learning models. In the video, the script discusses using these locally on one's computer for AI tasks, emphasizing the privacy benefits over cloud-based embeddings.

💡Data Privacy

Data Privacy is a central theme in the video, focusing on the risks of using online chatbots for sensitive information. The script raises concerns about potential data mining, training on user data, and selling of data by online vendors, advocating for local processing as a solution.

💡Nvidia's Chat with RTX

Nvidia's Chat with RTX is a feature that allows users to utilize their Nvidia graphics card to run AI models locally. The script compares this to Anything LLM, noting that while it may not offer the best conversational experience, it provides a more private alternative for AI processing.

💡Mixing and Matching Models

Mixing and matching models refers to the ability to combine different AI models and embedding services to create a customized setup. The video script discusses the flexibility of Anything LLM in allowing users to choose from various online and local models and embedding services to suit their needs.

💡Open AI

Open AI is a company known for developing advanced AI models, with GPT (Generative Pre-trained Transformer) being a prominent example. The script positions Open AI's GPT as the gold standard in LLMs and discusses the trade-offs of using local models versus relying on Open AI's powerful but potentially less private services.

💡File Embedding

File embedding is the process of converting the content of files into a format that AI models can analyze and understand. The video demonstrates the challenges faced when trying to embed files locally using Anything LLM, showing that the process was not as seamless as expected.

💡Telemetry

Telemetry in the context of the video refers to the collection of usage data by software, which can be disabled for privacy reasons. The script mentions disabling telemetry in Anything LLM to prevent the collection of user data during the demonstration.

💡AMA (Ask Me Anything)

AMA, or Ask Me Anything, is a term used in the video to refer to a local AI model that the presenter uses for interaction. It is part of the discussion on running AI models locally and the presenter's experience in using it for various tasks, including file embedding and chat.

Highlights

Introduction to the episode discussing Anything LLM as an open-source alternative to Nvidia's Chat with RTX.

The importance of privacy in workplace settings when using large language models (LLMs) for sensitive information.

Explanation of Nvidia's Chat with RTX, which allows running AI models locally using Nvidia graphics cards.

Introduction to Anything LLM, which can use embeddings and vectors locally on a computer.

The flexibility of Anything LLM to connect with local models and online services for mixed usage.

Comparison of OpenAI's GPT-4 as the gold standard for LLMs and the challenges of using local alternatives.

Discussion on using Azure OpenAI for enterprise-level requirements in workplace settings.

Steps taken to install and start using Anything LLM on Windows, including setup and configuration.

Demonstration of Anything LLM's functionality, including embedding preferences and running local vectors.

Challenges faced with file uploading and indexing in Anything LLM, and troubleshooting these issues.

The significance of correct file embedding and indexing for accurate search results within Anything LLM.

Comparative analysis of Anything LLM and Nvidia's Chat with RTX based on user experience and performance.

Potential of using external embedding providers like Pinecone with Anything LLM for improved results.

Conclusion that both Anything LLM and Chat with RTX have limitations when running entirely locally.

Suggestions for future improvements and a possible revisit of the topic in a few months to assess progress.

Encouragement to subscribe and watch future episodes for more tech demos and project discussions.

Transcripts

play00:01

welcome everyone to another episode of

play00:03

unscripted coding today we are actually

play00:06

going to look at anything

play00:09

llm and this is an opsource

play00:13

alternative uh I I say that hesitatingly

play00:16

because it's a little more than just an

play00:19

alternative to nvidia's chat with

play00:24

RTX now let's think back we're still

play00:27

talking about llm AI chck Bots that is

play00:31

large language models like chat GPT

play00:35

Claude Pi where you can interact with an

play00:39

AI now for me and you having regular

play00:43

chats about where our next vacation

play00:45

might be what a good uh dessert to make

play00:48

is um it makes perfect sense to just

play00:52

chat with chat GPT and and ask away but

play00:57

if you think about working in a work

play01:00

place where you might look at say an

play01:02

employment contract or an employment

play01:05

matter that information is pretty

play01:07

sensitive and you don't you absolutely

play01:10

shouldn't be putting it into just any

play01:13

random chatbot on the internet because

play01:16

you don't know if they'll train on your

play01:17

data if they'll mine your data if

play01:19

they'll sell your data blah blah blah

play01:21

blah blah so the best way to keep all of

play01:25

this to yourself um away from all of

play01:29

these online onine vendors and and cloud

play01:31

services is to run it all locally and so

play01:34

nvidia's chat with RTX was a pretty

play01:38

interesting idea you have your Nvidia

play01:41

graphics card so you have something that

play01:43

can run uh these AI models with

play01:46

sufficient speed and you're running it

play01:49

all locally on your own computer now

play01:52

again you need you know a pretty decent

play01:56

modern computer pretty expensive

play01:58

computer but you can run it all locally

play02:00

on your own uh computer my experience

play02:04

with chat RTX chat with RTX was so so so

play02:08

I did take a look and to see if anything

play02:11

else is out there and anything llm

play02:14

popped

play02:16

up I am going to boot this up right now

play02:20

I downloaded it for windows installed it

play02:24

there's really not too much more to say

play02:27

here but this is where we're at we have

play02:30

a get started I haven't started anything

play02:33

but in theory anything llm will let you

play02:36

use embeddings and vectors locally on

play02:39

your computer so you can use your

play02:41

graphics card to look at files on your

play02:44

own computer and you can connect with uh

play02:47

local models or for all of the above

play02:51

connect

play02:52

online and so this is kind of nice

play02:55

because you can mix and match it you can

play02:58

have all your files stored locally but

play03:01

share it

play03:03

with um gp4 or or Claude Opus or Claude

play03:08

Sonic um so you can pick and choose or

play03:11

you can use these models online but um

play03:16

use the models locally but do the

play03:18

embeddings all online because you trust

play03:21

a certain

play03:22

vendor Mix A match is pretty important

play03:26

because if you take a look at the option

play03:30

for

play03:33

llms the gold standard is open AI

play03:36

there's there's no doubt about it gp4 is

play03:39

the gold standard so um running

play03:41

something locally like

play03:44

AMA um is going to

play03:48

be you're not going to have as good a

play03:51

conversation so you have to pick and

play03:53

choose um now for

play03:55

me um if I was doing this in a workplace

play03:59

you know Azure open AI is pretty

play04:02

Enterprise already that might be an

play04:04

option for you to choose if you need a

play04:06

really strong powerful llm but we have

play04:12

installed ol Lama in our previous video

play04:16

and I'm just taking a look at it now um

play04:20

and trying to get it up and running just

play04:23

a sec

play04:27

here okay um so now that running we can

play04:31

have a

play04:33

Lama

play04:43

H

play04:45

uh

play04:47

let's take a look now AMA just runs an

play04:50

icon at the very bottom and that's why

play04:53

I'm all right so uh AMA is now running

play04:58

in the background uh just had to install

play05:00

and double click

play05:02

it and I it took a bit of time to

play05:05

confirm that the base URL is

play05:11

right so let's give it a moment to load

play05:14

the available

play05:20

models let me just double check that all

play05:23

llama's actually running very

play05:28

good

play05:38

see if I can skip

play05:46

over

play05:48

Perfect all right embedding preferences

play05:51

so once again you can

play05:54

actually um Run online

play05:57

services so open a a uh Azure open AI

play06:02

but let's use the buil-in engine here

play06:07

and finally I think there is something

play06:09

built in here as

play06:12

well ah perfect 100% local Vector that

play06:18

sounds great so in theory between these

play06:21

three everything should be running

play06:23

offline we can

play06:25

disconnect and be able to to use it so

play06:29

we'll skip the survey and we'll call

play06:32

this YouTube

play06:36

demo and here we

play06:56

go okay so first of all let's just try a

play07:00

regular chat hello

play07:05

there and now it's should be reaching to

play07:08

my fi model in ol Lama to try and get a

play07:13

response now this seems

play07:16

slower than um when I had run it purely

play07:21

on AMA on a command line but let's try

play07:24

this

play07:25

again I'm doing

play07:28

well

play07:30

can you tell me about Harry Potter let's

play07:35

say perfect now we're starting to get

play07:38

the right speed Harry Potter is a series

play07:40

of seven fantasy novels that sounds

play07:43

about right now the next thing I wanted

play07:46

to do and let's see if here's we have

play07:49

the

play07:50

settings um the next thing I wanted to

play07:53

do is to actually

play07:56

connect uh files into this so we have

play08:02

history of chat we can change how things

play08:05

look we can obviously choose the models

play08:09

again oh

play08:12

um transcription model data

play08:19

connectors

play08:22

interesting let's disable the

play08:27

Telemetry and

play08:31

okay so I think here is where we can

play08:34

start

play08:35

connecting files or maybe

play08:46

not aha here we

play08:52

go okay so uh I'm going to reach into my

play08:55

bag of files here and I'm just going to

play08:59

drop up all sorts of authorization forms

play09:03

into here and so normally how these

play09:06

things work is it should take just a

play09:08

little bit of time to um embed these

play09:13

files

play09:16

properly I find it very strange that

play09:19

some files aren't being accepted but I

play09:23

think they might be all of the doc X

play09:28

Files hm

play09:31

maybe

play09:34

not

play09:39

um so we have these documents let's see

play09:41

if I can try again and add a whole bunch

play09:47

more I guess every time it has to be

play09:49

fresh

play09:51

that's bit strange to

play09:58

me

play10:00

but this time we're getting more

play10:02

documents nope I think I see duplicates

play10:16

now all right

play10:19

let's try this

play10:23

out um what kind of authorization would

play10:28

I use if I

play10:30

wanted uh

play10:33

to take my child home from the hospital

play10:40

so I have an authorization to release

play10:43

child from

play10:58

hospital

play11:00

no that doesn't seem very

play11:09

promising so I wonder if I can drop into

play11:21

here okay let's pull up the file to take

play11:25

a

play11:28

look

play11:37

seems simple enough

play11:58

um

play12:20

this is quite disappointing so I think

play12:24

I'm going to log off for a second take a

play12:27

look at the documentation and come right

play12:31

back okay so with the handy YouTube

play12:34

video from Tim carat uh I was able to

play12:39

see where we screwed up so let's go back

play12:42

to the files and we can click all of

play12:44

these and move it into the workspace so

play12:47

I knew there's something here I kept

play12:49

trying to click this and it didn't quite

play12:51

work but now it should take a little bit

play12:55

of time to generate that embedding and

play12:57

that was the issue of that was what

play13:01

confused me at first you do need some

play13:04

processing to try and index embed create

play13:08

vectors whatever they want to call it to

play13:10

make all of these files searchable

play13:13

basically and so um this might take a

play13:17

moment we'll let it

play13:28

run

play13:35

unfortunately I wasn't recording my

play13:37

voice in that last little segment so

play13:39

we're going to run through this again

play13:41

very quickly so one of the challenges

play13:44

was I kept trying to click here to uh

play13:47

the center right here to move files over

play13:49

because I recognized that we were

play13:51

uploading to this document section but

play13:53

not moving it into our workspace uh very

play13:57

simply and I have a hello world file

play14:02

here um

play14:05

oops it's not

play14:08

right let's drag in over

play14:11

here and uh we have the hello world file

play14:14

we were supposed to check one of these

play14:17

boxes and move it over by clicking this

play14:20

so uh we can move over a hello world

play14:22

file and um we can save and make the

play14:27

embedding now when I had about what is

play14:30

that maybe 30 different doc files it

play14:33

took about a minute and a half uh now

play14:35

I'm just uploading one simple text file

play14:39

so it took you know a couple seconds

play14:43

here this is going to be a little bit

play14:45

different but let's start a new thread

play14:48

um just brand new uh when we click

play14:50

upload a document you can see already

play14:53

our uh workspace has a bunch of

play14:56

different files and if I hover uh one of

play14:58

these

play14:59

let's take a look release of medical

play15:02

records authorization for the release of

play15:04

medical records I might say do you have

play15:08

a file for the release of medical

play15:14

records it's going to go through um and

play15:18

it'll apologize because it can't

play15:21

recognize a text that's clearly not what

play15:24

was intended but uh you can see that it

play15:26

is citing different files now this first

play15:29

one resealing information that's not

play15:32

right not right

play15:36

and for Municipal Police Department not

play15:40

right either so it's not picking up the

play15:42

file that I was looking for which was

play15:45

that authorization of release of medical

play15:48

records now since I've already done this

play15:50

I did a couple different tests um let's

play15:54

let's go one more time and say uh do you

play15:58

have a hello World text file and

play16:01

hopefully it

play16:05

will find

play16:07

it let's try this one more

play16:11

time

play16:16

um wonder what the issue is but let's

play16:19

let's just try a new thread and see if

play16:21

we're going

play16:23

to clearly something is wrong but if we

play16:27

go back to some of the threads I had

play16:30

before um you will see that I tried to

play16:33

fetch a different file and once again it

play16:37

cited the wrong one so in this case I

play16:39

was looking for some sort of

play16:41

authorization to release a child from

play16:44

the hospital and um it's picking up

play16:47

authorizations but that's all I uploaded

play16:50

into

play16:51

it long story short it just wasn't

play16:54

working very well and it doesn't

play16:57

surprise me I am using F which is a poor

play17:02

you know very small model but I also

play17:04

took the the time to actually switch

play17:08

into GPT so if I use GPT 4

play17:11

here and and try one more

play17:16

time do you have a Hello World

play17:23

file well um something is clearly not

play17:28

working quite right oh

play17:33

okay um let's go back into the settings

play17:38

llm preference and I'm going to go back

play17:40

to AMA

play17:49

here ah you know what let's skip it um

play17:53

not too important here um what was

play17:56

important was that during those last

play17:59

segments I I tried a number of times to

play18:01

fetch files and I think the problem is

play18:04

that it's getting very poor citation

play18:07

something uh the embeddings and the

play18:09

vectors file is serving up the wrong

play18:12

files it didn't really matter that we

play18:14

were using fi or gp4 um because the

play18:18

underlying file the the file was not the

play18:22

right one so there's not a whole lot the

play18:24

model can say sure it could be more

play18:26

eloquent it could say it in more words

play18:28

fewer words but but it wasn't even

play18:31

picking up the right file so it wasn't

play18:33

giving the right information um as I was

play18:36

doing that section of the video I

play18:38

started looking at maybe we'll use pine

play18:41

cone maybe uh we'll use open AIS and

play18:43

bedding providers um but that starts to

play18:46

defeat the purpose of why we try to use

play18:49

anything llm in the first place which

play18:51

was to have this entirely offline now I

play18:55

had a mediocre experience with chat with

play18:59

r RTX but in this case again we're

play19:02

talking about everything locally run on

play19:04

my computer I had an even worse

play19:06

experience with anything llm now I'll

play19:09

give it to them that their interface

play19:11

looks much nicer much more polished but

play19:14

running their local embedding in Vector

play19:16

database it wasn't great um running a

play19:20

llama that's okay and depending on your

play19:22

choice of models that uh will run very

play19:25

similarly or uh close to chat with RTX

play19:30

um but if your goal is to run all of

play19:34

this locally on your computer I can

play19:36

firmly say that neither are great

play19:39

choices if I lean slightly towards chat

play19:42

with RTX today especially because it

play19:44

looks like they did a recent update as

play19:48

well that said I don't want to knock on

play19:50

anything llm llm too hard because you

play19:54

can ultimately mix and match and that

play19:56

may be something very valuable uh to you

play20:01

if you want to use uh different models

play20:04

paired with different embedding models

play20:06

so you can use uh open AI for the llm

play20:09

but you decide you want something else

play20:11

for your embedding

play20:14

model this you know breaks you out of

play20:17

open ai's ecosystem it gives you a nice

play20:20

interface to work with all of that uh is

play20:22

very positive and I have no doubt that

play20:24

if you decide to use uh open AI open AI

play20:28

with pine cone that you might be able to

play20:30

get much much much better uh results but

play20:35

again my purpose originally was seeing

play20:38

if I could run this all locally on a

play20:41

computer and you know maybe we should

play20:43

revisit this in 3 months or 6 months

play20:46

because it's clearly not there it's a

play20:48

brand new idea definitely to run this

play20:51

locally and I don't know if a lot of

play20:54

people have this on their radar so I

play20:56

don't know that this is high high high

play20:58

priority for Nidia or these folks here

play21:02

um but long story short not quite there

play21:05

yet it's a cool idea this is a great

play21:07

interface this is great demo but um I it

play21:12

needs more work so we'll revisit this

play21:15

I'm

play21:15

sure I hope you enjoyed this video and

play21:19

um subscribe and check us out next week

play21:22

for another quick project or demo

Rate This

5.0 / 5 (0 votes)

Связанные теги
Local AIData PrivacyChatbotsNVIDIA RTXAI ModelsEmbeddingsVector DatabaseOffline ComputingTech ReviewSoftware Demo
Вам нужно краткое изложение на английском?