Assistant API with GPT-4 Turbo Vision: OpenAI's Complete Guide to Integration
Summary
TLDRこのビデオでは、OpenAIがリリースしたGBT 4 Turboのアップデートを紹介し、新しいエンドポイントを利用して画像を認識する方法について解説しています。しかし、現在のアシスタントAPIでは画像を直接認識することができないという問題があります。そこで、代替手段として「アシスタントAPI」を活用する方法を紹介します。例えば、請求書の画像をアップロードし、その内容を解析してメールを自動生成する手順を説明しています。Zapierを使用して、Google Driveのフォルダに新しいファイルを追加するトリガーを作成し、そのファイルから請求書の情報を抽出してアシスタントAPIに渡すプロセスを解説しています。この方法は、コードを書かずとも実現でき、ビジネスでのAI活用を容易にします。
Takeaways
- 😀 Open AIがGBT 4 Turbo 2024, 049という新しいエンドポイントをリリースし、画像認識の能力が追加された。
- 🔍 現在のAssistance APIで画像を見ることはできないが、ワークアラウンドを使って機能を活用することができる。
- 🛠️ ワークアラウンドにはコードを使用する方法とコードなしの方法の2つのアプローチが提案されている。
- 📚 ビデオでは、コードを使用しない方法として、Zapierを使い、Google Driveのフォルダにファイルをドラッグ&ドロップしてデータを提供する方法が説明されている。
- 📧 特定のタスク(請求書の処理など)に特化したAssistance APIを作成し、それをZapierと連携させることで自動化を実現する。
- 📈 請求書の画像を分析し、その内容をもとにメールを自動生成するシナリオが紹介されている。
- 🔗 中間ブロックを使って画像から抽出されたデータをAssistance APIにフィードし、より具体的な情報を得ることができる。
- 💡 ソフトウェア開発の一般的なワークフローとして、データを再フォーマットして別のAIプロバイダーのブロックに渡す方法が示されている。
- 🔧 将来的には、Assistance APIがよりネイティブで統合された形になることを期待していると述べている。
- 👍 ビデオが役立つと感じたら、いいねをクリックするよう呼びかけている。
Q & A
OpenAIがリリースしたGBT 4 Turboの新しいエンドポイントで利用できる機能は何ですか?
-GBT 4 Turboの新しいエンドポイントでは、ビジョン機能にアクセスすることができるようになりました。
アシスタントAPIでGBT 4 Turboのビジョン機能をどのように活用する予定ですか?
-ビデオでは、コードを使用する方法とコードなしの方法で、アシスタントAPIでGBT 4 Turboのビジョン機能を活用する方法を紹介しています。
ビデオではどのようなワークアラウンドが提案されていますか?
-ビデオでは、アシスタントAPIが画像を直接読むことができないため、画像からデータを抽出し、それをアシスタントAPIにフィードするワークアラウンドが提案されています。
アシスタントAPIの現在のUIでは画像をどのように扱う予定ですか?
-現在のUIでは、画像を直接扱うことはできないとのことですが、代替手段としてZapierを利用して画像からデータを抽出し、アシスタントAPIにフィードする方法が提案されています。
ビデオで紹介された「invoice handler」アシスタントは何を担当する予定ですか?
-「invoice handler」アシスタントは、インボイスデータをもとにメールを自動生成するようにトレーニングされています。
Zapierと連携してアシスタントAPIを活用する際のワークフローはどのようなものでしょうか?
-Zapierを利用してGoogle Driveの新しいファイルをトリガーに設定し、画像から抽出されたデータを中間ブロックを通じてアシスタントAPIに渡すワークフローが提案されています。
ビデオではなぜ「invoice paid」というテキストが重要だと述べていますか?
-「invoice paid」というテキストは、インボイス画像から抽出される重要なデータポイントであり、アシスタントAPIにフィードされるため重要だと述べています。
ビデオで提案されたワークアラウンドは、どのような場面で有効ですか?
-ビデオで提案されたワークアラウンドは、アシスタントAPIが画像を直接読むことができない場合に、画像からデータを抽出し再利用する際に有効です。
ビデオの最後に紹介されたプレイリストは何に関するものですか?
-ビデオの最後に紹介されたプレイリストは、ビジネスを活用するためのAIと自動化の方法に関するものです。
ビデオで紹介されたコードでのワークフローはどのようなものですか?
-ビデオで紹介されたコードでのワークフローは、画像からデータを抽出し、それをHTTP呼び出し可能関数やFirebaseデータベースなどを介してアシスタントAPIに渡す方法です。
Outlines
🤖 AIのアップデートとアシスタントAPIの活用
OpenAIがGBT 4 Turboの更新を発表し、新エンドポイントgbt 4 Turbo 2024,049で画像認識機能が利用可能になった。ビデオでは、この機能をアシスタントAPIでどのように活用するかを解説する。まずは現在のUIで画像を読み取れるかを確認し、できない場合はワークアラウンドとしてコードとノーコードのアプローチを紹介する。また、アシスタントAPIをビジネスに活用する方法についての過去のビデオも紹介している。
📁 画像からデータを抽出し、アシスタントAPIと連携
アシスタントAPIを利用して画像からデータを抽出し、その情報を元にメールを自動生成する方法を説明する。Zapierを使い、Google Driveの新しいファイルをトリガーにして、画像の内容を解析し、アシスタントAPIにデータをフィードするプロセスを解説している。このワークフローは、ソフトウェア開発におけるデータの再フォーマットと連携の標準的な手法として、他のAIプロバイダーでも応用が可能であると述べている。
Mindmap
Keywords
💡Open AI
💡GBT 4 Turbo
💡Vision Capability
💡Assistance API
💡ワークアラウンド
💡Zapier
💡HTTP Callable
💡画像解析
💡カスタムモデル
💡自動化
Highlights
OpenAI released an update to GPT-4, specifically Turbo, with a new endpoint GPT-4 Turbo 2024, 049.
The new endpoint provides the ability to access vision capabilities.
A common question is how to leverage this new vision capability in the Assistant API.
The video will demonstrate both code and no-code approaches to utilize the vision capability.
A previous video discussed the cost associated with an app using this endpoint for calorie counting from images.
The current Assistant UI does not support image viewing, despite the new endpoint's capabilities.
A workaround is introduced to leverage the vision capability in the Assistant API.
The Assistant API is described as a more specialized version of the models, with additional functionalities.
A no-code method is presented using the Assistant API to handle invoice data.
Zapier is used to create a workflow that triggers on a new file in a Google Drive folder.
The workflow extracts data from an image and feeds it into the Assistant API.
A middleware block is used to reformat data before passing it to the Assistant API.
The video demonstrates how to extract invoice data from an image and use it to generate an email summary.
The process can be translated into a code-like manner using HTTP callable functions or databases.
It is hoped that future updates will make the integration of vision capabilities more native within the Assistant API.
The video concludes with a call to action to like the video and subscribe for more content on leveraging AI and automation.
Transcripts
open AI released an update to gbt 4
Turbo more specifically now we have the
ability to access the vision capability
on this new endpoint gbt 4 Turbo 2024
049 now a major question I've been
getting asked in the comments is okay
this is super cool but how the heck do
we leverage it in the assistance API so
in today's video I'm going to show you
how now to be clear this is a workaround
so I'm going to give you a code way of
approaching this logic but also I'm
going to give you a no code way of
approaching this logic let's jump in
earlier this week I did an entire video
on this tweet we got from openingi
developers talking about this endpoint
where specifically I ran a little
scenario of like what the cost would be
associated with this kind of app where
you take a picture you get the calories
so you can check out that video right
there to get more context on that but
after that video I got a ton of comments
asking me how the heck do we leverage
this in an assistant API first off let's
see if it's even possible within the
current assistant UI that we have access
to to see an image using this endpoint
so I'm going to go and name this image
test and you're just going to have a
simple instructions here we're going to
say tell me what is in the
image period we're going to go to the
specific model that was released
recently which is gb4 Turbo 2024
049 and this should be sufficient we're
going to go ahead and go to the
playground in this playground in theory
based on what we've been told in the
documentation this endpoint should have
the ability to read images Let's test it
though I'm going to go and attach an
image the image I attached is going to
be a fake invoice here and we're going
to Simply ask this endpoint what is this
inv voice what is this image I attached
hit run but then we get that kind of
response sorry I can't currently view
images however I can help you understand
analyze text or descriptions of the
images you provide them this is no good
though as a lot of people want to
leverage this assistant API in the
context of reading images so in this
video Let's see the possible workaround
also I want to point out if you want to
see more in- depth videos on just how to
leverage assistance without this new
update check out this video right here I
show you how to manage like over 50
files within assistant if you don't know
what an assistant API is think of it as
like an even lasered in version of these
models but you can provide like you know
different files functions code
interpreter there's a bunch of like
extra layering of the cake we can do
with these kind of assistance apis to
make it more specialized for my business
I've done like three or four videos on
this topic so either look through my
Channel or type in Corbin assistance API
therefore in order to leverage this kind
of Technology let's go and create a new
assistant let say create new assistant
here and this assistant let's just say
is going to be our invoice hand hander
invoice Handler first way I'm going to
show you is going to be a no code way we
going to say invoice Handler we will
provide you with invoice data I'm going
to go to Upper model to gbt for Turbo
the modern one we're going to say We'll
provide you with invoice data based on
this data please write an email
summarizing it this is going to be
really simple because I just want to
show you how to do it obviously there's
a ton of other stuff like I referenced
earlier so if you want to see the other
videos of adding files making this more
laser in for a business adding content T
adding all this other stuff go ahead and
watch that we have our current assistant
it's going to be invoice Handler we have
our directions we have a model let's go
and jump over to zapier in zapier let's
go ahead and say create new zap now we
have our new zap here let's go ahead and
do a trigger of Google drive folder this
is going to how we're going to provide
the data that's going to be read and do
an event of new file and folder we're
going to do an account okay that should
be
fine do test data here perfect and let's
go ahead and jump over to drive in our
drive here we're going to goad and drag
over that image I had earlier I like
showing Drive in these kind of cases
because of the fact that it's a free
service so you can get access to it
right away and you can test this
yourself without paying anything we got
our invoice paid PNG in here let's go
ahead and first off basically find the
data that's relevant in this invoice
assistant API is our name here we're
going to go ahead and test this trigger
and we should see the relevant piece of
data here which is invoice paid so
basically every time we want to trigger
this we would just drag in the file in
that specific folder and then it would
proceed everything that about to do
today there we go we got our file here
invoice paid PNG this is the workaround
y'all we're going to do a Chad gbt block
CH Chad gbt we're going to do an event
of analyze image content with vision
because here's the idea you have a
lasered in an assistance API your
assistance API is trained on your data
is trained on everything you want it to
be trained on you have it where it's
very lasered in for the specific task
you want it to do you can reference
previous conversations with it etc etc
but here's the one thing it can do read
images but but we can still provide the
information that would be necessary that
we would feed the assistant API already
EG if we were already going to do the
assistance API in the context of reading
an image what are we doing we're
internalizing specific data from that
image to feed it into the assistance API
so a great workaround is let's just get
the data that we would want anyways and
then feed it into the assistance API
with a midterm block this is actually a
pretty common workflow in the context of
building out software where you want to
reformat data and pass it down to
another open AI block or AI provider
block in this context we're doing it spe
specifically to feed it into the
assistant API but a lot of times in
software and development you could do
this in other context where maybe you
have a ton of data that you want to
compress with a 3.5 block and then use a
chat GPT 4 block of that compressed data
and provide another Value Point that way
so this is normal and this is standard
we're going to apply to assistance API
therefore let's go ahead and say
continue we're going to choose our
account continue
we're going to do the message here we're
going to just say provide all the
relevant information from this invoice
we'll do things such as build to build
to comma
descriptions
comma total
amount Etc so if you want to get very
specific here I have a bunch of other
stuff on this topic but the idea here is
that I just want to show you let's grab
the data and then play with it so we're
going to expect basically the build to
descriptions and see what else they see
is relevant in this context we're going
to continue here and test this step also
the image is going to be file exists but
not shown it's not going to be the title
is that's a string of data just text
this is the actual image itself and
here's the relevant information so we
got the bill to we got the from got the
date got the description oh this went
basically overboard so I said Etc so
basically that took it as let's just
provide all the relevant data found
within that image to gut check this real
quick let's find out build to Olivia
Wilson build to Olivia Wilson final
price is around 2,100 then we were
looking at 2,100 perfect in four
different
Services revision social media templates
local design graphic design all right
that has all the relevant information
that we want therefore from extracting
this data from this image we can feed
this into an assistant block so I'm
going to go ahead and click this little
plus button here and do another chat gbt
do chat gbt here we do an events and
we're going to do conversation with
assistant and they continue here
continue again and with the message
since we've already proctored within our
assistance dashboard what to do we
actually don't have to worry too much
here we can actually just provide the
invoice data like this invoice data
semicolon parenthesis and we should
expect a response an email relevant to
this information let's chose our
assistant we got our insistant right
here invoice Handler and everything else
should be standard here everything else
is kind of already preset in that
dashboard that we saw earlier but
continue here and I should be expecting
an email or like a pseudo email or a
draft email here we go so this is the
data we fed to it I like using invoice
um I like referencing the Val variable
point because of the fact that we said
that earlier here we said the invoice
data here so we're going to reference
that with what we provide here then we
do parentheses in order to basically
compartmentalize the data that we're
providing here and then we're going to
come down here and we should see an
email and boom subject line invoice
summary invoice 200015 from warier in
here l Wilson we confirmed that earlier
and we should see all the relevant
information in our invoice 2100 and all
the relevant information in regards to
that image therefore the logic you just
saw here can be translated in a codel
likee manner all you need to understand
in that context is we're going to be
passing data around within an HTTP
callable function or whatever you know
maybe you want to use a Firebase
database flow it might be a little bit
more C heavy but the idea there
basically is that you call that endpoint
in the first one the first function
right here and you you know say your max
token set your temperature have your
prompt provide the relevant data point
of the image asking for an output and
then the second part of that logic would
be like the output from that first
function or from that first callable
point you would send it to the
assistance API in the code and then
you'd get your relevant information
slata that you care about this is pretty
common in the context of software
development and just workflow in general
where you kind of want to maybe you know
restructure the data a little bit before
you send it to a further workflow or
further Chad gbt block or anthropic
block or whatever you're using in that
context what I will say is that probably
in the future let's hope at least
when referencing the assistant API
they'll make it more native and
integrated so you don't have to
necessarily do that little middle block
between if you feel like you learned
something today's video make sure to
leave a like it's completely free we'll
leave another assistance video I did a
couple weeks ago and showing you how to
leverage it for your business at the end
here I'm also going to leave a playlist
showing you how to leverage Ai and
automation this is the playlist I was
referring to when it comes to Ai and
Automation and how to leverage on your
business that is an assistance video
more specifically an assistance API
video you can start learning how to
leverage this kind of tech easier here
and that's my face daily content
subscribe bye-bye
Browse More Related Video
ShareGPT4Videoの利用方法!AIが動画を解析して、起きていることを説明してくたり、面白ポイントを教えてくれます。また、動画からポエムを書いてくれたりもします。
ChatGPTスゴイ活用術10選!プロンプト集も
【AI動画の作り方】アニメのai実写化動画はどう作る?SNSでバズるAI動画を作って、マネタイズする方法とは
ChatGPTの使い方と使用例
LPサイトを自動生成するAI😆"概要"からWEBページを作成してくれる「AI Landing Page Generator」の使い方!ランディングページの制作もAIに手伝ってもらう時代に突入😎
【簡単】ジャーナリングでメンタルブロックを外すワーク
5.0 / 5 (0 votes)