The First AI That Can Analyze Video (For FREE)

The AI Advantage
28 Mar 202415:39

Summary

TLDRGoogle's AI Studio has recently become available to the public, introducing the Gemini 1.5 Pro model, notable for its 1 million tokens of context. The video explores unique features and use cases of the studio, emphasizing its appeal beyond developers. With comparisons to other models and platforms, it highlights the studio's multimodal capabilities, including video analysis, and advanced settings for customization. The script illustrates how the Gemini 1.5 Pro model enables in-depth interaction with extensive documents and multimedia, offering insights into leveraging its potential for creative and research applications, despite restrictions in Europe.

Takeaways

  • 🌟 Google's AI studio has left early access, featuring unique capabilities including the Gemini 1.5 Pro model, exclusive outside of Europe.
  • 🚀 The Gemini 1.5 Pro model offers an unprecedented 1 million tokens of context, vastly surpassing the capacity of previous models.
  • 💻 Despite being a developer interface, Google's AI studio is accessible and useful even for non-developers, offering enhanced features like model switching and temperature settings.
  • 🔍 Advanced settings in the studio allow for detailed control over the AI's behavior, addressing common concerns about bias and moderation.
  • 🎥 Unique to this platform is the ability to upload and analyze video content, integrating both visual and audio data.
  • 👩‍💻 The platform supports various prompt types, including chat, free form, and structured, facilitating a wide range of user interactions.
  • 🛠️ Structured prompts enable multi-shot learning, allowing users to train the model for specific outputs with examples.
  • 📚 With a massive context window, users can upload extensive documents or transcripts, like appliance manuals or podcast episodes, for detailed analysis.
  • 🔑 The ability to fine-tune models with user-provided input-output pairs offers a tailored AI experience, enhancing the relevance of responses.
  • 🌍 Accessibility is a concern, as users in Europe need a VPN to access the service, highlighting geographic limitations in AI tool availability.

Q & A

  • What is the significance of Google's AI Studio coming out of early access?

    -Google's AI Studio coming out of early access signifies that it is now available to the general public, offering advanced features and access to powerful models like Gemini 1.5 Pro with 1 million tokens of context, previously not as accessible.

  • Why is Europe excluded from using Google's AI Studio?

    -The script hints at a restriction for Europe but does not provide specific reasons, which could be due to regulatory, legal, or compliance issues related to data privacy and AI governance in European jurisdictions.

  • What are some key features of Google's AI Studio?

    -Key features include the ability to switch models quickly, set temperature for creativity control, access advanced features like prompt presets, and use the developer interface for enhanced customization.

  • How does Gemini 1.5 Pro model compare to other models like GPT-3.5 and GPT-4?

    -Gemini 1.5 Pro sits between the advanced models like GPT-4 and the Pro model similar to GPT-3.5, offering a unique position with its 1 million tokens of context, providing extensive depth for data processing and generation.

  • What unique functionality does Google's AI Studio provide concerning video input?

    -Google's AI Studio allows users to upload and work with video inputs directly, a feature not available in ChatGPT, Cloud AI, or other open-source models. It can recognize both visual and audio components of the video.

  • How does the temperature setting in AI Studio affect model output?

    -The temperature setting controls the creativity of the model, where a higher temperature leads to more creative but potentially less accurate outputs (prone to hallucinations), and a lower temperature results in more consistent, predictable outputs.

  • What are the types of prompts available in Google's AI Studio?

    -Google's AI Studio offers three types of prompts: chat prompts, which are simple and straightforward; free form prompts, which include variables for dynamic use; and structured prompts, which involve multi-shot or few-shot prompting for pattern recognition.

  • How can the Gemini 1.5 Pro model's 1 million tokens of context be advantageous?

    -The 1 million tokens of context allow for processing and understanding much larger documents or datasets in a single prompt, enabling complex tasks like summarizing extensive manuals or analyzing lengthy podcast transcripts, which is not possible with models having lower token limits.

  • What is the safety setting feature in Google's AI Studio?

    -The safety setting allows users to control the model's moderation level, letting them decide whether to filter or block certain outputs, giving users more control over the content generated by the model.

  • How does Google's AI Studio facilitate the creation and testing of prompts?

    -Google's AI Studio enables users to easily create, test, and save prompts with variables and examples. It supports multimodal inputs, including video, and allows for fine-tuning models by providing input-output pairs, making it highly versatile for developing tailored AI applications.

Outlines

00:00

🚀 Introduction to Google's AI Studio and Gemini 1.5 Pro Model

Google's AI Studio has been made available to the public, featuring exclusive access to the Gemini 1.5 Pro model, which stands out for its 1 million tokens of context capacity. This model is significant for developers and non-developers alike, offering advanced capabilities beyond those found in traditional chat interfaces like ChatGPT. This segment also addresses concerns regarding AI bias and the importance of allowing users to make their own moderation decisions. The video script mentions the presenter's location at an AI conference in Las Vegas and sets the stage for demonstrating the unique features and use cases of the AI Studio, emphasizing the flexibility and advanced settings available to users.

05:01

🔧 Exploring Google AI Studio's Features and Prompt Types

The video script delves into the detailed features of Google AI Studio, including the developer interface, prompt presets, and the unique ability to handle video inputs— a capability not found in other models. It introduces three types of prompts: chat, free form, and structured, with examples on how to use them for versatile applications. The free form prompt is highlighted for its ability to handle variable inputs, making it easy to run multiple prompts simultaneously. The structured prompt section focuses on the technique of multi-shot (or few-shot) prompting for generating predictable, consistent outputs. This part of the script provides practical tips on using the platform's features to achieve more complex and tailored results, emphasizing the platform's adaptability for various use cases.

10:06

👩‍🏫 Fine-Tuning Models and Advanced Use Cases for Gemini 1.5 Pro

This segment introduces the concept of fine-tuning models within Google AI Studio, explaining how providing multiple input-output pairs can significantly improve model performance for specific tasks, like generating Instagram bios. It also discusses advanced use cases for the Gemini 1.5 Pro model, leveraging its million tokens of context capacity. Examples include uploading extensive documents like manuals or podcast transcripts to extract specific information or summaries. The presenter emphasizes the transformative potential of using long-context capabilities for deep research and information retrieval, offering practical advice on how to leverage these features for personal or professional development.

15:09

🎓 Concluding Thoughts on the Power of Long Context Analysis

The final part of the script reflects on the game-changing nature of Google AI Studio's long context capabilities, highlighting how users can research and extract insights from extensive content, like hours-long podcast transcripts, without the need for manual listening or note-taking. It suggests potential applications of this technology for efficiently gathering knowledge from vast amounts of information. The presenter invites viewers to share their ideas for unique use cases that leverage the long context window provided by the Gemini 1.5 Pro model, aiming to foster a community of innovative users exploring the limits of what AI can achieve.

Mindmap

Keywords

💡Gemini 1.5 Pro

Gemini 1.5 Pro is a model mentioned in the video as a standout feature of Google's AI studio, notable for its 1 million tokens of context. This large context window allows for processing extensive data, such as long documents or conversations, enabling more comprehensive and nuanced interactions. The model is positioned as an advanced tool between other models, offering unique capabilities not found in standard chat interfaces or previous models like GPT-3.5.

💡Developer Interface

The Developer Interface is highlighted in the video as a significant aspect of Google's AI studio, emphasizing that it offers a broader range of features compared to standard chat interfaces. It allows users to customize their experience, such as switching models and setting parameters like temperature, enabling more tailored and sophisticated uses of AI, even for those not developing their own applications.

💡Tokens of Context

Tokens of context refer to the amount of information, in terms of discrete pieces of data (tokens), that a model can consider at one time. In the video, the Gemini 1.5 Pro model's capacity to handle 1 million tokens is presented as a significant advancement, allowing for the processing of much longer texts or data streams than previous models, enhancing the model's ability to understand and generate complex content.

💡Multimodal

Multimodal capabilities are discussed in the video as a unique feature of the AI studio, specifically the ability to process and interpret video inputs. This extends the AI's utility beyond text and images to include video content, allowing it to analyze and respond to a broader range of data types, including visual elements and audio from videos.

💡Temperature Setting

The temperature setting, as explained in the video, controls the 'creativity' of the AI's responses. A higher temperature results in more creative and varied outputs, while a lower temperature produces more predictable and conservative responses. This feature allows users to balance between novelty and reliability in the AI's outputs, depending on the task at hand.

💡Safety Settings

Safety Settings are mentioned as a notable feature that addresses past concerns about AI bias and content moderation. These settings allow users to customize the level of moderation applied to the AI's outputs, providing a way to manage the balance between creativity and appropriateness according to the user's preferences.

💡Free Form Prompt

The Free Form Prompt is introduced as a flexible way to create prompts with variable parts that can be defined by the user, allowing for dynamic and versatile use cases. This feature is designed to make it easier for users to construct and test different prompt configurations quickly, improving the efficiency of working with the AI.

💡Structured Prompt

Structured Prompting is described in the video as a method involving providing the AI with a prompt and multiple examples of the desired output format, known as few-shot learning. This approach helps the AI recognize and replicate the provided pattern, leading to more predictable and consistent results tailored to specific needs.

💡Fine Tuning

Fine Tuning is discussed as a process of customizing a model to perform a specific task more effectively by providing it with numerous examples of input-output pairs. This can lead to a model that performs exceptionally well for a particular use case, as it 'learns' from the examples given, integrating this knowledge into its responses.

💡Long Context Window

The Long Context Window is a feature of the Gemini 1.5 Pro model that allows it to process and remember a significantly larger amount of information in a single instance. This capability is highlighted through examples such as analyzing lengthy documents or compiling extensive research material, enabling deep and informed interactions based on a vast amount of data.

Highlights

Google's AI studio exits early access, introducing features and the Gemini 1.5 Pro model with 1 million tokens of context.

Despite being a developer interface, Google's AI studio is accessible for non-developers and offers more features than typical chat interfaces.

The Gemini 1.5 Pro model is highlighted for its advanced capabilities, positioned between standard and advanced AI models like ChatGPT 3.5 and GPT-4.

The studio includes a unique feature allowing users to directly control the AI's moderation settings, addressing feedback on bias and content moderation.

Google AI Studio supports uploading video for multimodal interactions, a feature not available in other models or platforms.

The platform introduces adjustable temperature settings and stop sequences for customized output control.

Safety settings within Google AI Studio empower users with flexibility over content moderation, addressing a common request from the community.

The introduction of free form prompts enables dynamic and variable-driven interactions, enhancing user engagement and customization.

Google AI Studio allows for the easy creation and saving of prompt templates for reusability and efficiency.

Structured prompts and few-shot prompting techniques are introduced, allowing users to guide AI output more effectively.

The tutorial covers the basics of fine-tuning AI models by providing input-output pairs, simplifying the fine-tuning process.

Google's AI studio's capacity for 1 million tokens of context is unprecedented, opening up new possibilities for handling large datasets.

A use case demonstrates uploading and querying extensive documents like appliance manuals, leveraging the large context window.

The platform enables deep dives into long-form content like podcast transcripts, significantly enhancing research and learning capabilities.

Europeans need to use a VPN to access Google's AI studio, indicating regional availability restrictions.

The narrator emphasizes the transformative potential of the long context window for researching and engaging with extensive content.

Transcripts

play00:00

Google's AI studio just came out of early access,  meaning everybody can use this, except Europe,  

play00:05

but we'll talk about that. And this is packed  with features that I haven't seen anywhere else,  

play00:09

and it includes access to the Gemini 1.5 Pro  model, which actually really matters because  

play00:13

it has 1 million tokens of context. I'll show you  two really interesting use cases that you could be  

play00:18

doing immediately with this. So as you can see,  I'm not in my home studio. I'm in Las Vegas for  

play00:22

AI conference right now. But that's not going to  stop me of showing you why you should care about  

play00:27

this and what features are not obvious, but really  useful here. Okay. So first things first, this is  

play00:32

a developer interface, but this shouldn't stop you  from using it. If you don't care about developing  

play00:36

your own apps at all. Matter of fact, developer  interfaces, just like OpenAI's Playground,  

play00:40

offer way more features than chat interfaces like  ChatGPT. You get to do things like switch models  

play00:46

quickly, set temperature, and you get advanced  features, which you still don't have in ChatGPT,  

play00:50

like these prompt presets. I'll cover all of this  in a second, but the most important part here is  

play00:54

the underlying model. If you go to the AI studio,  you will get access to the Gemini 1.5 Pro model.  

play01:00

Okay. There's also the Gemini 1.0 Pro model, but  we won't be talking about that today because the  

play01:04

1.5 with the 1 million tokens of context is really  the star here. If you're confused by the naming,  

play01:10

they essentially have the Pro model, which is like  ChatGPT 3.5. And then they have advanced model,  

play01:14

which is like GPT 4. In between lives the Gemini  1.5 Pro. And hey, I already see all the haters  

play01:19

in the comments typing, Oh, their AI is biased.  Remember what happened with the images? Yes, yes,  

play01:24

I know. But matter of fact, there's one setting  in here, which I've been asking for ever since  

play01:28

the release of ChatGPT, and they actually did it.  So kudos to them. And it directly counters most  

play01:32

of the comments that people are going to have  around what happened. And don't get me wrong.  

play01:35

I'm not trying to sweep that under the rug. That  was a big deal. And if you follow the channel,  

play01:38

you know that my stance generally is let  the people decide, let the market decide,  

play01:42

give us more options. Don't make these moderation  decisions for us. And they did exactly that in  

play01:46

here, which I really like. So let's get into  touring the interface here. And I'll do this as  

play01:50

I usually do on the channel. And this tour will be  focused on non-developers. If you're a developer,  

play01:55

everything we cover is relevant, because these  are the first steps. So first things first, you  

play01:58

can go to create new and right off the bat, you  get three different type of prompts that you could  

play02:03

run here. If you're familiar with ChatGPT, which  you probably are when you're watching this video,  

play02:07

the chat prompt is the simplest to understand.  This is a simple prompt that you type in,  

play02:11

and it generates a result with the model that  you select over here. Very simple. So you know,  

play02:16

let's run the classic penguin essay with Gemini  1.5 Pro in here, you can see it loads for a bit,  

play02:21

and then boom, we should get a result in a second  here. Alright, and by the way, I'm logged in with  

play02:26

my Google account, this is not necessary, it just  works, you can just generate results like this.  

play02:29

Now here's the first interesting part, at the top  you can upload different multimodal file types to  

play02:34

add to your prompt. Now one thing is super unique  here, you can add video. If you're not aware,  

play02:41

there's no other model out there that does video.  ChatGPT doesn't do it, Cloud doesn't do it,  

play02:46

the open source models don't do it. Here you  can straight up upload a video and work with  

play02:49

it. It's gonna recognize what's in the video both  visually, and it's gonna recognize the audio. But  

play02:54

more on this in a second, because beyond this  multimodal features, and this chat interface  

play02:57

that you're probably familiar with, here on  the right we get a temperature setting, as we  

play03:01

do in OpenAI's Playground. If you're not aware,  this is essentially the creativity of the model,  

play03:04

so when you have a high setting, it's gonna be  super creative, but it's also gonna be more prone  

play03:08

to hallucination. These two are really linked  with LLMs, and if you tone down the temperature,  

play03:12

you really limit the creativity, and you're gonna  get similar results every single time. But as you  

play03:15

can see, this is not always editable, so it really  depends on what model you use. For example, if I  

play03:19

go to Gemini 1.0 Pro, I get to alter this, with  Gemini 1.5 Pro, this is set in stone. A stop  

play03:24

sequence is that if it recognizes certain words,  it's gonna stop at that point. So in other words,  

play03:30

if you're creating a list of, let's say,  YouTube titles, you could say stop sequence 11,  

play03:34

and when it arrives at point 11, it will stop the  prompt and just give you the output. To be honest,  

play03:38

this one is not very useful, but here we get to  the one that I talked about, the safety settings.  

play03:42

They actually give you control over how you want  the model to behave. Now they don't give you all  

play03:47

the control, but this is a step in the right  direction, I think. And for me, for all four of  

play03:51

these, I'm gonna set these to block view. I want  the model to give me its output, so I don't want  

play03:55

it to censor it for me. I'm a grown man. I can  handle that. I am a man. So I'm just gonna set  

play04:00

this to block view, and there you go. And then if  we go to advanced settings, top peak controls the  

play04:05

cutoff for the token probabilities. Now this  is something I don't really ever use, but it  

play04:09

does work together with temperature, and I know  I told you that temperature controls creativity,  

play04:14

but it really controls the probability inside of  the model. So temperature of one is gonna let it  

play04:18

access the full probability spectrum, and the top  P of 0.4 is gonna say, take only the top 40% of  

play04:24

those results and get me an answer from within  there. But in giving you the full probability  

play04:28

spectrum of what these LLMs can generate, it's  gonna result in more creative stuff. That's why  

play04:32

I communicated it that way. Anyway, we're just  gonna close top P, and now let's get into some new  

play04:35

features here and the use cases, because there  is some. So if we go over here to create new,  

play04:39

we talked about the chat prompt, right? But  next up, let's talk about the free form prompt,  

play04:43

because this is quite interesting. From day one  of this channel, I've been communicating prompts  

play04:48

in a formula format. If you subscribe to the  newsletter and you got the free chat GPT template,  

play04:52

you're gonna see that every single prompt in  there has square brackets around certain words.  

play04:56

Square brackets are there for you to replace  the words with your very own variables or  

play05:01

use cases or words, whatever you wanna call it.  And I didn't just make that up out of thin air.  

play05:05

That's how you use these things. That's how you  program applications with LLMs and prompts on the  

play05:10

backend. Certain parts of the prompt are variable.  They're gonna differ. And in Google AI Studio,  

play05:15

you can actually set that and really easily add  multiple examples and run multiple prompts. This  

play05:19

is something I've been wanting inside of chat GPT  since a while. So let me show you how it works.  

play05:23

And I pulled up this example that I created  with the help of this getting started guide.  

play05:27

But if you're watching this video, you're  not gonna really need the getting started  

play05:29

guide. I cover everything in there and more.  If you wanna get into developing with this,  

play05:33

this is gonna be very helpful. There's a lot  of good guides in here. But back to this free  

play05:36

form prompt. So what is a free form prompt? A free  form prompt is a prompt with a variable in it. And  

play05:41

that variable can be defined here at the bottom.  Okay, so if I say, look at the following picture  

play05:46

and tell me who is the architect, you can see that  who is the architect is a test input. And the way  

play05:51

I created this is very simple. I just went to a  new free form prompt. This is the prompt. Then I  

play05:55

said test input. And then what I'm gonna do is I'm  just gonna rename the input from inputs to fact,  

play06:00

let's say. And then here, I'm gonna say who was  the architect, I'm gonna add a new example. And  

play06:05

then I'm gonna add three examples that make sense  in the context of architectural image. And then  

play06:11

I can go ahead and go to image here. You can also  take it from your Google Drive. But what I'm gonna  

play06:15

just do here is take one of the sample images  here. Let's say this pyramid from Egypt here.  

play06:19

Amazing. And when I say run here at the bottom,  you're gonna see that we're using the Gemini 1.5  

play06:23

Pro model to run these free prompts on top of the  image. And I get all three results down here. And  

play06:28

now here's the best part. You get to save all of  this, okay? So all of this work. When I do this  

play06:32

in chat GPT, I would need to take it and put it  inside of a GPT and then access that. Here, you  

play06:36

can just quick save prompts like this. You just  say save. I'm just gonna say architecture analyst.  

play06:41

Fair enough. And I connected my Google Drive here.  And this allows me to save the prompt in my Google  

play06:45

Drive. As you can see, architecture analyst  has been added over here. And I can access it  

play06:49

anytime. And you're gonna see the free variables  are down here. You could delete them like so. Add  

play06:54

new ones. And this is really a great environment  to test your prompts and build them out. In all  

play06:58

the Notion templates I've ever given away or sold,  I always gave multiple examples of what you could  

play07:03

put in there. But here, you can actually put it to  work and create a template where you have multiple  

play07:07

variations of one prompt and you can get the  results in a heartbeat like so. And then you  

play07:11

could even go ahead and say and and add another  variable. Variable two. There you go. And here,  

play07:16

I could write a second part. So as you can see,  you can make this as complex as you want because  

play07:21

I can be going in and creating multiple variables.  Very useful stuff where you can get quite complex  

play07:25

results and you can get a lot done in a short  time. And the next time you just want to prompt  

play07:29

on top of a different image, you just come in  here, you switch out the image, and you don't need  

play07:32

to rebuild the whole thing. There you go. I'm just  gonna add this. Now, these variables are obviously  

play07:37

empty, so I'm just gonna delete that. But rerun  it, and I have my new examples right here. All  

play07:41

right. So now we talked about the chat prompt and  the freeform prompt. Now, let's talk about the  

play07:44

structured prompt. And this one will be familiar  to people who've been watching the channel  

play07:48

closely. Because I talked about multi-shot, also  called few-shot prompting, many, many times on  

play07:53

here. As a matter of fact, I always say it's one  of the most useful techniques you could deploy  

play07:57

when using a large language model. If you're not  familiar, the way it works is essentially you have  

play08:01

a prompt, and then you provide multiple examples  of how you would like the output to look like,  

play08:05

ideally three or more. Because essentially what  these LLMs do is they just recognize patterns, and  

play08:09

then they recreate those patterns. If you tell it,  this is the pattern that I want you to recreate,  

play08:14

it's gonna recognize that, and it's gonna  give you more of the same. So in other words,  

play08:18

providing multiple examples is a really good idea  if you want predictable and consistent outputs.  

play08:23

So how does this work? I set up a little profile  bio generator. This is as simple as it could be.  

play08:27

But again, you could plug in any prompt in here.  I'm just showing you how this interface works.  

play08:31

And basically what I did is I said, write me an  Instagram profile bio based on Instagram profile  

play08:35

name. Here are some examples. Now, I do recommend  including this block, otherwise the resulting  

play08:40

prompt is gonna give you all the results every  single time. But what I'm doing is I changed the  

play08:44

inputs to profile name and the output to profile  description. Maybe I should rename this to bio.  

play08:48

This would be even more accurate as I use that in  the prompt. All right, so if I say VA advantage,  

play08:53

simplifying AI everyday, let's just say this would  be the profile bio that I expected to generate,  

play08:57

that I wanted to generate. Now this could be way  more complex. This is just a quick example. But  

play09:01

as you can see, I get three of these examples.  So for Butter Chicken Enthusiast 9000, enjoying  

play09:06

life one nan at a time. Excellent. By the way, the  Butter Chicken Enthusiast, that's me. I eat that  

play09:11

dish everywhere I go. It has become kind of a must  for me. I just love it so much. Anyway, the point  

play09:15

here is this. Now that we have the prompt plus  three examples, we essentially created something  

play09:20

that if I go here to the bottom and just say Cat  Enthusiast and I run this, you're gonna see that  

play09:25

I get a result here that is very comparable to  the three examples that I provided. Now you're  

play09:29

gonna see that if I delete these examples, it's  gonna be very different. And look at that. The  

play09:33

format doesn't resemble the one before. And that's  the point of giving it the examples. Now the cool  

play09:38

thing here is this. While you watch this tutorial,  you actually kind of learned how to fine tune a  

play09:42

model because that's what a fine tuned model  is. You just need way more example, like over  

play09:46

100 examples. I'm just gonna cancel out here, not  save my progress because what I already did is I  

play09:50

saved this little structured prompts generator and  I named it profile bio generator, as you can see  

play09:55

right here. And now if I go to new tuned models,  you can see that I can actually pick the profile  

play09:59

bio generator because now I effectively fine tuned  a model because all fine tuning is it providing  

play10:05

it with input output pairs. If I input this, I  want this to come out. If you do that 100 times,  

play10:11

the model is going to sort of add that to its  training data and now be aware of all your input  

play10:14

output pairs. So this is a really good strategy if  you're trying to model to do one specific thing.  

play10:19

If you're only generating Instagram bios, you  really want to give it a lot of pairs of Hey, if  

play10:23

I input this profile name, I want this bio to be  there. And if you do that a lot of times, you're  

play10:27

going to find that the model performs so much  better in that particular use case. But you can  

play10:31

see down here, the minimum is 20 I only provided  free. So for fine tuned models, you really need  

play10:36

more with open AI, you need over 100 examples  for this to work super reliably. But there you  

play10:40

go. This is probably the easiest way I've seen to  fine tune a model, you know how it works. So yeah,  

play10:45

that is the structure prompt. So now we covered it  all. Now the last thing that is left to talk about  

play10:49

here is use cases for the Gemini 1.5 pro model  because it really is a unique model. A million  

play10:55

tokens of context is something that is unheard of.  No other model does this. Claude Opus has 200k,  

play11:00

GPT-4 has 128k. If you watched last week's use  case episode that we do every single Friday,  

play11:05

you will have heard about GPT-4.5 that is rumored  to have 256k tokens of context. This has a million  

play11:12

that is a lot, okay. So I'm going to show you  two ways how you can take advantage of that.  

play11:16

Because in most everyday use cases, this is not  the most useful thing. You don't need that much  

play11:20

context. But here are two where you could use it.  Well, the first one is maybe a little unexpected.  

play11:24

You could upload every manual there is, right? And  this doesn't really work with chat GPT because the  

play11:29

context isn't long enough to accommodate something  like this fridges manual. I don't know. I just  

play11:34

took a random fridge of the LG site. You could  take literally any manual. If you didn't know,  

play11:38

pretty much every manual is available online. So  you could just do this for any appliance. I'm just  

play11:42

going to go here, go to file, file upload. As you  can see, you could also do it directly from your  

play11:46

drive. I'm just going to drag this over and it  shows the token count here. So I'm going to need  

play11:50

for external tokenizer. That's another neat  feature. And now from the 63 page manual,  

play11:54

I could ask something concrete. So for example,  ice will not dispense if any of the refrigerator  

play11:59

doors are left open. I'll just ask, why is the  ice not dispensing? And while it does that,  

play12:03

I mean, just think about the fact that's a very  long document, only 37,000 tokens. So you could  

play12:07

easily use all the other models we talked about  for this. I'm just saying some manuals will be  

play12:11

way too long for something like this. If you have  a 500 page PDF, well, you know how to use it now.  

play12:16

And there you go. LG refrigerator, it gives me all  the different reasons why that could be the case  

play12:21

based on the PDF. Okay. So that's one. And here's  another one. I'm just going to go to the latest  

play12:26

episode of my favorite podcast here. And what you  can do is you can pull the transcript. You don't  

play12:30

even need an external app. It's inside of YouTube.  So if you go down here, show transcript, then here  

play12:34

on the left side, you can toggle timestamps. And  what you can do now is just select all of this,  

play12:38

like so, copy. And now in a new chat prompt, I  can simply add all of this as context. Here's a  

play12:44

podcast I listen to, Paste. Now, one tricky thing  is that if you just press command V control V on  

play12:50

Windows, it won't work right away. You need to  add shift to remove the formatting. So command  

play12:54

shift V on a Mac here, and it's going to paste it  without the formatting. And there you go. Look at  

play12:59

the super, super long transcript of this very long  podcast. I mean, this is over one hour, right? So  

play13:04

just by default, you get a summary here. But you  could do this with things that are way longer,  

play13:08

because I only used 18,000 tokens here, meaning we  could even do a Lex Friedman podcast or multiple  

play13:14

Lex Friedman podcast transcripts in here. And then  you can talk to them. Every single prompt you've  

play13:19

ever discovered will work with this. And you can  do it on long form content like this and do deep  

play13:24

research. What about this one with Bill Ackman,  I actually listened to this monster of a podcast  

play13:28

on my trip to the US, short transcript. This  is where the scroll bar comes in really handy,  

play13:33

because there's just so much of this. And look,  if you want to go beyond this and want to use  

play13:36

this regularly in your company, there's totally  programmatical ways of doing this, of pulling  

play13:40

the transcripts off YouTube. I'm just showing  you manually of how to get your feet wet, how  

play13:44

to do this manually yourself. Here's another one.  I'm just gonna paste this, Command Shift V. Oops,  

play13:49

I just realized I copied the same one in again.  But it doesn't matter. We have a lot of context.  

play13:53

So we can just simply send a new message. Yep,  this is the Bill Ackman interview here. All right,  

play13:58

this is gonna take a second. But as soon as it  loads it in, we have power at our disposal that  

play14:03

simply wasn't possible a month or two ago.  And everybody has this now. Oh, actually,  

play14:07

everybody except Europe, if you're in Europe,  you've got to use a VPN. Now that I'm in Vegas,  

play14:11

I can actually access it for the first time  without having to use VPNs. So that's not the  

play14:15

biggest workaround in the world. And I think  it's worth it. This is completely free. And  

play14:19

playing with long context like this turns out  to be very valuable. I mean, I guess it depends  

play14:23

if it's valuable on what you do. But I would  just say give it a shot. Once you experience  

play14:27

this yourself of being able to copy a three hour  podcast and another one hour podcast into this,  

play14:33

you're going to realize that you could be doing  this with speeches of your favorite people. And  

play14:37

you could be prompting on top of all the wisdom  that is contained inside of that. So even now,  

play14:41

we're just at 85,000 tokens of context. So that  is less than one tenth of what's possible here.  

play14:46

But either way, I'm just going to go ahead and try  and ask a specific question about a point that was  

play14:50

covered in the Lex and Bill Ackman podcast about  a big conflict here with Carl Icahn. So I'm just  

play14:54

going to ask who abused Bill Ackman's short  position. Now, this is something that won't  

play14:58

be in the training data. It was in the podcast.  It was concretely discussed in there. I won't get  

play15:01

into the details. But the answer we want here is  Carl Icahn. Let's see if it gets it. There you go.  

play15:05

Just think about the fact that we have five hours  in here and we're only at 85,000 tokens. So you  

play15:09

could do multiple episodes of a podcast and search  all that wisdom without having to spend hours,  

play15:14

tens of hours listening to all of it and paying  attention and taking notes. This is an incredible  

play15:18

way of researching topics and now you know  how to do it too for free. All right. That's  

play15:22

essentially all I got today. If you have any  other use cases that a long context window like  

play15:25

this unlocks that are interesting, please share  them in the comments below. And other than that,  

play15:29

check out this video that gives you some tips on  how to effectively prompt because everything you  

play15:32

learn there, you're going to be able to apply on a  long context model like Gemini 1.5 Pro. All right.

Rate This

5.0 / 5 (0 votes)

Related Tags
AI StudioGemini 1.5 ProDeveloper ToolsMultimodal AIAI ConferenceLas VegasTech TutorialContent CreationMachine LearningSoftware Development