Offline AI Chatbot with your own documents - Anything LLM like Chat with RTX - | Unscripted Coding
Summary
TLDRIn this episode of 'Unscripted Coding,' the host explores 'Anything LLM,' an open-source alternative to Nvidia's 'Chat with RTX,' focusing on local AI model operations to protect sensitive data. The host installs and tests 'Anything LLM' on Windows, discussing its potential for local embeddings and vectors, and the flexibility to mix local and online models. Despite a polished interface, the host encounters issues with file embeddings and vectors, suggesting that while the concept is promising, the execution needs improvement. The video concludes with the recommendation to revisit the tool in the future, as it shows potential but currently falls short of expectations.
Takeaways
- 🎙️ The video explores 'anything llm,' an open-source alternative to Nvidia's Chat with RTX.
- 💻 The speaker discusses the importance of using local AI models for sensitive information to avoid data privacy issues associated with online chatbots.
- 🔧 Nvidia's Chat with RTX allows running AI models locally using Nvidia graphics cards but requires a modern, powerful computer.
- 🛠️ The speaker's experience with Chat with RTX was mediocre, prompting a search for alternatives like 'anything llm.'
- 📥 'anything llm' can utilize embeddings and vectors locally, allowing the use of local files and models while optionally connecting to online services.
- 🌐 The gold standard for language models is OpenAI's GPT-4, but local models like ol' LLaMA can be used for different needs.
- 🔍 The video demonstrates the installation and initial setup of 'anything llm,' including connecting local files for processing.
- ⚙️ The process involves embedding files to make them searchable, but the speaker encountered issues with the accuracy of file retrieval.
- 🤖 The video shows a test of 'anything llm,' which struggled with accurately citing the correct files from the embeddings.
- 📊 Despite a polished interface, the speaker finds 'anything llm' lacking in performance when running entirely locally, suggesting it might improve over time.
Q & A
What is the main topic of the 'Unscripted Coding' episode discussed in the transcript?
-The main topic is exploring 'Anything LLM', an open-source alternative to Nvidia's 'Chat with RTX', focusing on the use of large language models (LLMs) locally on a computer for privacy and data security.
Why is it risky to use online chatbots for sensitive information like employment contracts?
-Using online chatbots for sensitive information is risky because there's a possibility that these platforms may train on your data, mine your data, or sell your data without your consent, compromising privacy and security.
What is Nvidia's 'Chat with RTX' and how does it relate to the topic?
-'Chat with RTX' is an idea by Nvidia that allows users to run AI models locally on their own computer using an Nvidia graphics card, ensuring that data processing is done on the user's own hardware, thus addressing privacy concerns.
What is the primary advantage of running AI models locally as opposed to using cloud services?
-The primary advantage is that running AI models locally keeps all data and processing within the user's own computer, reducing the risk of data breaches, unauthorized data access, and ensuring complete control over the data.
What does 'Anything LLM' offer that differentiates it from other AI chatbots?
-'Anything LLM' offers the ability to use embeddings and vectors locally on the user's computer, allowing for local processing of files and interaction with AI models without the need for online services.
What is the significance of being able to mix and match different models and embedding services in 'Anything LLM'?
-The ability to mix and match allows users to choose the best combination of models and embedding services that meet their specific needs, providing flexibility and potentially better performance or security.
What is the 'gold standard' for LLMs as mentioned in the transcript?
-The 'gold standard' for LLMs, as mentioned, is Open AI's GPT (Generative Pre-trained Transformer), which is recognized for its advanced capabilities in language understanding and generation.
What was the speaker's experience with 'Chat with RTX' and 'Anything LLM'?
-The speaker had a mediocre experience with 'Chat with RTX' but found 'Anything LLM' to be less satisfactory, particularly with the local embedding and vector database not functioning as expected.
What issue did the speaker encounter while trying to connect files to 'Anything LLM' for processing?
-The speaker encountered issues with the file embedding process, where the system was not correctly identifying and serving up the correct files, leading to inaccurate responses from the AI.
What was the speaker's suggestion for improving the experience with 'Anything LLM'?
-The speaker suggested revisiting the tool after a few months, as it is a new idea and may benefit from further development and updates to address the current issues.
What is the speaker's final verdict on using 'Anything LLM' for local AI processing?
-The speaker concludes that while the idea of 'Anything LLM' is promising and the interface is polished, it is not yet ready for reliable local AI processing due to the issues encountered with file embeddings and model performance.
Outlines
🤖 Exploring Alternatives to Chat with RTX
The video introduces a new episode focused on examining 'anything llm', an open-source alternative to Nvidia's 'Chat with RTX', which allows local AI model execution on a personal computer. The host discusses the privacy concerns of using large language models (LLMs) for sensitive information, like employment contracts, and the benefits of running these models locally to protect data. Nvidia's 'Chat with RTX' is highlighted as a solution using the user's graphics card for local AI processing. The host shares their mixed experience with 'Chat with RTX' and their curiosity about 'anything llm', which they proceed to install and explore.
🔍 Setting Up and Testing 'Anything LLM'
The host describes the process of setting up 'anything llm' on Windows, emphasizing the ease of installation but noting the need for a modern, high-performance computer. They explore the software's interface, discussing the options for using local or online models and the flexibility of combining storage and processing. The host attempts to use the software for a regular chat and tests its ability to handle file embeddings, facing some issues with file acceptance and processing.
📚 Embedding Documents and Searching for Specific Files
The host attempts to embed various authorization forms into 'anything llm' to make them searchable within the system. They encounter difficulties with certain file types and the process of embedding, which leads to a temporary halt in the demonstration to review the documentation. After resolving the issue, they successfully embed a 'Hello World' text file and demonstrate the system's ability to search for and cite files, although with some initial inaccuracies in file recognition.
🔄 Troubles with File Citation and Model Performance
The host experiences issues with the system's file citation accuracy, noting that it incorrectly identifies files when asked about specific documents. They switch between different models, including F and GPT, to test if the model's performance affects the citation accuracy. Despite these changes, the core issue of incorrect file retrieval persists, suggesting a problem with the underlying embedding and vector database rather than the model itself.
🔄 Reflecting on the Experience and Future Prospects
The host concludes the video by reflecting on their experience with 'anything llm', acknowledging its polished interface but expressing disappointment with the local embedding and vector database's performance. They compare 'anything llm' to 'Chat with RTX', leaning slightly towards the latter for its recent updates. The host suggests that while 'anything llm' offers flexibility in mixing and matching models and embeddings, it requires further development to be a viable local solution. They encourage viewers to subscribe for future updates and express hope for revisiting the topic in the coming months.
Mindmap
Keywords
💡LLM (Large Language Models)
💡Local AI Processing
💡Anything LLM
💡Embeddings and Vectors
💡Data Privacy
💡Nvidia's Chat with RTX
💡Mixing and Matching Models
💡Open AI
💡File Embedding
💡Telemetry
💡AMA (Ask Me Anything)
Highlights
Introduction to the episode discussing Anything LLM as an open-source alternative to Nvidia's Chat with RTX.
The importance of privacy in workplace settings when using large language models (LLMs) for sensitive information.
Explanation of Nvidia's Chat with RTX, which allows running AI models locally using Nvidia graphics cards.
Introduction to Anything LLM, which can use embeddings and vectors locally on a computer.
The flexibility of Anything LLM to connect with local models and online services for mixed usage.
Comparison of OpenAI's GPT-4 as the gold standard for LLMs and the challenges of using local alternatives.
Discussion on using Azure OpenAI for enterprise-level requirements in workplace settings.
Steps taken to install and start using Anything LLM on Windows, including setup and configuration.
Demonstration of Anything LLM's functionality, including embedding preferences and running local vectors.
Challenges faced with file uploading and indexing in Anything LLM, and troubleshooting these issues.
The significance of correct file embedding and indexing for accurate search results within Anything LLM.
Comparative analysis of Anything LLM and Nvidia's Chat with RTX based on user experience and performance.
Potential of using external embedding providers like Pinecone with Anything LLM for improved results.
Conclusion that both Anything LLM and Chat with RTX have limitations when running entirely locally.
Suggestions for future improvements and a possible revisit of the topic in a few months to assess progress.
Encouragement to subscribe and watch future episodes for more tech demos and project discussions.
Transcripts
welcome everyone to another episode of
unscripted coding today we are actually
going to look at anything
llm and this is an opsource
alternative uh I I say that hesitatingly
because it's a little more than just an
alternative to nvidia's chat with
RTX now let's think back we're still
talking about llm AI chck Bots that is
large language models like chat GPT
Claude Pi where you can interact with an
AI now for me and you having regular
chats about where our next vacation
might be what a good uh dessert to make
is um it makes perfect sense to just
chat with chat GPT and and ask away but
if you think about working in a work
place where you might look at say an
employment contract or an employment
matter that information is pretty
sensitive and you don't you absolutely
shouldn't be putting it into just any
random chatbot on the internet because
you don't know if they'll train on your
data if they'll mine your data if
they'll sell your data blah blah blah
blah blah so the best way to keep all of
this to yourself um away from all of
these online onine vendors and and cloud
services is to run it all locally and so
nvidia's chat with RTX was a pretty
interesting idea you have your Nvidia
graphics card so you have something that
can run uh these AI models with
sufficient speed and you're running it
all locally on your own computer now
again you need you know a pretty decent
modern computer pretty expensive
computer but you can run it all locally
on your own uh computer my experience
with chat RTX chat with RTX was so so so
I did take a look and to see if anything
else is out there and anything llm
popped
up I am going to boot this up right now
I downloaded it for windows installed it
there's really not too much more to say
here but this is where we're at we have
a get started I haven't started anything
but in theory anything llm will let you
use embeddings and vectors locally on
your computer so you can use your
graphics card to look at files on your
own computer and you can connect with uh
local models or for all of the above
connect
online and so this is kind of nice
because you can mix and match it you can
have all your files stored locally but
share it
with um gp4 or or Claude Opus or Claude
Sonic um so you can pick and choose or
you can use these models online but um
use the models locally but do the
embeddings all online because you trust
a certain
vendor Mix A match is pretty important
because if you take a look at the option
for
llms the gold standard is open AI
there's there's no doubt about it gp4 is
the gold standard so um running
something locally like
AMA um is going to
be you're not going to have as good a
conversation so you have to pick and
choose um now for
me um if I was doing this in a workplace
you know Azure open AI is pretty
Enterprise already that might be an
option for you to choose if you need a
really strong powerful llm but we have
installed ol Lama in our previous video
and I'm just taking a look at it now um
and trying to get it up and running just
a sec
here okay um so now that running we can
have a
Lama
H
uh
let's take a look now AMA just runs an
icon at the very bottom and that's why
I'm all right so uh AMA is now running
in the background uh just had to install
and double click
it and I it took a bit of time to
confirm that the base URL is
right so let's give it a moment to load
the available
models let me just double check that all
llama's actually running very
good
see if I can skip
over
Perfect all right embedding preferences
so once again you can
actually um Run online
services so open a a uh Azure open AI
but let's use the buil-in engine here
and finally I think there is something
built in here as
well ah perfect 100% local Vector that
sounds great so in theory between these
three everything should be running
offline we can
disconnect and be able to to use it so
we'll skip the survey and we'll call
this YouTube
demo and here we
go okay so first of all let's just try a
regular chat hello
there and now it's should be reaching to
my fi model in ol Lama to try and get a
response now this seems
slower than um when I had run it purely
on AMA on a command line but let's try
this
again I'm doing
well
can you tell me about Harry Potter let's
say perfect now we're starting to get
the right speed Harry Potter is a series
of seven fantasy novels that sounds
about right now the next thing I wanted
to do and let's see if here's we have
the
settings um the next thing I wanted to
do is to actually
connect uh files into this so we have
history of chat we can change how things
look we can obviously choose the models
again oh
um transcription model data
connectors
interesting let's disable the
Telemetry and
okay so I think here is where we can
start
connecting files or maybe
not aha here we
go okay so uh I'm going to reach into my
bag of files here and I'm just going to
drop up all sorts of authorization forms
into here and so normally how these
things work is it should take just a
little bit of time to um embed these
files
properly I find it very strange that
some files aren't being accepted but I
think they might be all of the doc X
Files hm
maybe
not
um so we have these documents let's see
if I can try again and add a whole bunch
more I guess every time it has to be
fresh
that's bit strange to
me
but this time we're getting more
documents nope I think I see duplicates
now all right
let's try this
out um what kind of authorization would
I use if I
wanted uh
to take my child home from the hospital
so I have an authorization to release
child from
hospital
no that doesn't seem very
promising so I wonder if I can drop into
here okay let's pull up the file to take
a
look
seems simple enough
um
this is quite disappointing so I think
I'm going to log off for a second take a
look at the documentation and come right
back okay so with the handy YouTube
video from Tim carat uh I was able to
see where we screwed up so let's go back
to the files and we can click all of
these and move it into the workspace so
I knew there's something here I kept
trying to click this and it didn't quite
work but now it should take a little bit
of time to generate that embedding and
that was the issue of that was what
confused me at first you do need some
processing to try and index embed create
vectors whatever they want to call it to
make all of these files searchable
basically and so um this might take a
moment we'll let it
run
unfortunately I wasn't recording my
voice in that last little segment so
we're going to run through this again
very quickly so one of the challenges
was I kept trying to click here to uh
the center right here to move files over
because I recognized that we were
uploading to this document section but
not moving it into our workspace uh very
simply and I have a hello world file
here um
oops it's not
right let's drag in over
here and uh we have the hello world file
we were supposed to check one of these
boxes and move it over by clicking this
so uh we can move over a hello world
file and um we can save and make the
embedding now when I had about what is
that maybe 30 different doc files it
took about a minute and a half uh now
I'm just uploading one simple text file
so it took you know a couple seconds
here this is going to be a little bit
different but let's start a new thread
um just brand new uh when we click
upload a document you can see already
our uh workspace has a bunch of
different files and if I hover uh one of
these
let's take a look release of medical
records authorization for the release of
medical records I might say do you have
a file for the release of medical
records it's going to go through um and
it'll apologize because it can't
recognize a text that's clearly not what
was intended but uh you can see that it
is citing different files now this first
one resealing information that's not
right not right
and for Municipal Police Department not
right either so it's not picking up the
file that I was looking for which was
that authorization of release of medical
records now since I've already done this
I did a couple different tests um let's
let's go one more time and say uh do you
have a hello World text file and
hopefully it
will find
it let's try this one more
time
um wonder what the issue is but let's
let's just try a new thread and see if
we're going
to clearly something is wrong but if we
go back to some of the threads I had
before um you will see that I tried to
fetch a different file and once again it
cited the wrong one so in this case I
was looking for some sort of
authorization to release a child from
the hospital and um it's picking up
authorizations but that's all I uploaded
into
it long story short it just wasn't
working very well and it doesn't
surprise me I am using F which is a poor
you know very small model but I also
took the the time to actually switch
into GPT so if I use GPT 4
here and and try one more
time do you have a Hello World
file well um something is clearly not
working quite right oh
okay um let's go back into the settings
llm preference and I'm going to go back
to AMA
here ah you know what let's skip it um
not too important here um what was
important was that during those last
segments I I tried a number of times to
fetch files and I think the problem is
that it's getting very poor citation
something uh the embeddings and the
vectors file is serving up the wrong
files it didn't really matter that we
were using fi or gp4 um because the
underlying file the the file was not the
right one so there's not a whole lot the
model can say sure it could be more
eloquent it could say it in more words
fewer words but but it wasn't even
picking up the right file so it wasn't
giving the right information um as I was
doing that section of the video I
started looking at maybe we'll use pine
cone maybe uh we'll use open AIS and
bedding providers um but that starts to
defeat the purpose of why we try to use
anything llm in the first place which
was to have this entirely offline now I
had a mediocre experience with chat with
r RTX but in this case again we're
talking about everything locally run on
my computer I had an even worse
experience with anything llm now I'll
give it to them that their interface
looks much nicer much more polished but
running their local embedding in Vector
database it wasn't great um running a
llama that's okay and depending on your
choice of models that uh will run very
similarly or uh close to chat with RTX
um but if your goal is to run all of
this locally on your computer I can
firmly say that neither are great
choices if I lean slightly towards chat
with RTX today especially because it
looks like they did a recent update as
well that said I don't want to knock on
anything llm llm too hard because you
can ultimately mix and match and that
may be something very valuable uh to you
if you want to use uh different models
paired with different embedding models
so you can use uh open AI for the llm
but you decide you want something else
for your embedding
model this you know breaks you out of
open ai's ecosystem it gives you a nice
interface to work with all of that uh is
very positive and I have no doubt that
if you decide to use uh open AI open AI
with pine cone that you might be able to
get much much much better uh results but
again my purpose originally was seeing
if I could run this all locally on a
computer and you know maybe we should
revisit this in 3 months or 6 months
because it's clearly not there it's a
brand new idea definitely to run this
locally and I don't know if a lot of
people have this on their radar so I
don't know that this is high high high
priority for Nidia or these folks here
um but long story short not quite there
yet it's a cool idea this is a great
interface this is great demo but um I it
needs more work so we'll revisit this
I'm
sure I hope you enjoyed this video and
um subscribe and check us out next week
for another quick project or demo
تصفح المزيد من مقاطع الفيديو ذات الصلة
1- Lets Learn About Langchain-What We Will Learn And Demo Projects
Unlimited AI Agents running locally with Ollama & AnythingLLM
AutoGen Studio Tutorial - NO CODE AI Agent Builder (100% Local)
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
The Hidden Life of Embeddings: Linus Lee
NEW Grok 2 vs ChatGPT 4 🥊 The ULTIMATE AI Showdown! (UNEXPECTED)
5.0 / 5 (0 votes)