Assistant API with GPT-4 Turbo Vision: OpenAI's Complete Guide to Integration

Corbin Brown
15 Apr 202409:34

Summary

TLDRこのビデオでは、OpenAIがリリースしたGBT 4 Turboのアップデートを紹介し、新しいエンドポイントを利用して画像を認識する方法について解説しています。しかし、現在のアシスタントAPIでは画像を直接認識することができないという問題があります。そこで、代替手段として「アシスタントAPI」を活用する方法を紹介します。例えば、請求書の画像をアップロードし、その内容を解析してメールを自動生成する手順を説明しています。Zapierを使用して、Google Driveのフォルダに新しいファイルを追加するトリガーを作成し、そのファイルから請求書の情報を抽出してアシスタントAPIに渡すプロセスを解説しています。この方法は、コードを書かずとも実現でき、ビジネスでのAI活用を容易にします。

Takeaways

  • 😀 Open AIがGBT 4 Turbo 2024, 049という新しいエンドポイントをリリースし、画像認識の能力が追加された。
  • 🔍 現在のAssistance APIで画像を見ることはできないが、ワークアラウンドを使って機能を活用することができる。
  • 🛠️ ワークアラウンドにはコードを使用する方法とコードなしの方法の2つのアプローチが提案されている。
  • 📚 ビデオでは、コードを使用しない方法として、Zapierを使い、Google Driveのフォルダにファイルをドラッグ&ドロップしてデータを提供する方法が説明されている。
  • 📧 特定のタスク(請求書の処理など)に特化したAssistance APIを作成し、それをZapierと連携させることで自動化を実現する。
  • 📈 請求書の画像を分析し、その内容をもとにメールを自動生成するシナリオが紹介されている。
  • 🔗 中間ブロックを使って画像から抽出されたデータをAssistance APIにフィードし、より具体的な情報を得ることができる。
  • 💡 ソフトウェア開発の一般的なワークフローとして、データを再フォーマットして別のAIプロバイダーのブロックに渡す方法が示されている。
  • 🔧 将来的には、Assistance APIがよりネイティブで統合された形になることを期待していると述べている。
  • 👍 ビデオが役立つと感じたら、いいねをクリックするよう呼びかけている。

Q & A

  • OpenAIがリリースしたGBT 4 Turboの新しいエンドポイントで利用できる機能は何ですか?

    -GBT 4 Turboの新しいエンドポイントでは、ビジョン機能にアクセスすることができるようになりました。

  • アシスタントAPIでGBT 4 Turboのビジョン機能をどのように活用する予定ですか?

    -ビデオでは、コードを使用する方法とコードなしの方法で、アシスタントAPIでGBT 4 Turboのビジョン機能を活用する方法を紹介しています。

  • ビデオではどのようなワークアラウンドが提案されていますか?

    -ビデオでは、アシスタントAPIが画像を直接読むことができないため、画像からデータを抽出し、それをアシスタントAPIにフィードするワークアラウンドが提案されています。

  • アシスタントAPIの現在のUIでは画像をどのように扱う予定ですか?

    -現在のUIでは、画像を直接扱うことはできないとのことですが、代替手段としてZapierを利用して画像からデータを抽出し、アシスタントAPIにフィードする方法が提案されています。

  • ビデオで紹介された「invoice handler」アシスタントは何を担当する予定ですか?

    -「invoice handler」アシスタントは、インボイスデータをもとにメールを自動生成するようにトレーニングされています。

  • Zapierと連携してアシスタントAPIを活用する際のワークフローはどのようなものでしょうか?

    -Zapierを利用してGoogle Driveの新しいファイルをトリガーに設定し、画像から抽出されたデータを中間ブロックを通じてアシスタントAPIに渡すワークフローが提案されています。

  • ビデオではなぜ「invoice paid」というテキストが重要だと述べていますか?

    -「invoice paid」というテキストは、インボイス画像から抽出される重要なデータポイントであり、アシスタントAPIにフィードされるため重要だと述べています。

  • ビデオで提案されたワークアラウンドは、どのような場面で有効ですか?

    -ビデオで提案されたワークアラウンドは、アシスタントAPIが画像を直接読むことができない場合に、画像からデータを抽出し再利用する際に有効です。

  • ビデオの最後に紹介されたプレイリストは何に関するものですか?

    -ビデオの最後に紹介されたプレイリストは、ビジネスを活用するためのAIと自動化の方法に関するものです。

  • ビデオで紹介されたコードでのワークフローはどのようなものですか?

    -ビデオで紹介されたコードでのワークフローは、画像からデータを抽出し、それをHTTP呼び出し可能関数やFirebaseデータベースなどを介してアシスタントAPIに渡す方法です。

Outlines

00:00

🤖 AIのアップデートとアシスタントAPIの活用

OpenAIがGBT 4 Turboの更新を発表し、新エンドポイントgbt 4 Turbo 2024,049で画像認識機能が利用可能になった。ビデオでは、この機能をアシスタントAPIでどのように活用するかを解説する。まずは現在のUIで画像を読み取れるかを確認し、できない場合はワークアラウンドとしてコードとノーコードのアプローチを紹介する。また、アシスタントAPIをビジネスに活用する方法についての過去のビデオも紹介している。

05:02

📁 画像からデータを抽出し、アシスタントAPIと連携

アシスタントAPIを利用して画像からデータを抽出し、その情報を元にメールを自動生成する方法を説明する。Zapierを使い、Google Driveの新しいファイルをトリガーにして、画像の内容を解析し、アシスタントAPIにデータをフィードするプロセスを解説している。このワークフローは、ソフトウェア開発におけるデータの再フォーマットと連携の標準的な手法として、他のAIプロバイダーでも応用が可能であると述べている。

Mindmap

Keywords

💡Open AI

Open AIは人工知能研究組織であり、このビデオでは彼らがリリースしたGBT 4 Turboというアップグレードについて説明しています。ビデオでは、Open AIが提供するAPIを活用して画像を解析し、情報を抽出するプロセスを解説しています。

💡GBT 4 Turbo

GBT 4 TurboはOpen AIが提供する高度な言語モデルの一つで、ビデオでは特に2024年049バージョンについて触れられています。このモデルはVision Capabilityを備えており、画像の分析が可能とされています。

💡Vision Capability

Vision Capabilityは画像を解析し、その内容を理解する機能を指します。ビデオでは、GBT 4 Turboがこの機能を備えていることと、それをAssistance APIと組み合わせて画像から情報を抽出する方法について説明しています。

💡Assistance API

Assistance APIとは、特定のタスクに特化したAIモデルを提供するAPIです。ビデオでは、このAPIを利用して、画像から得られたデータをもとにメールを自動生成するプロセスを紹介しています。

💡ワークアラウンド

ワークアラウンドとは、問題を解決するための一時的な解決策または方法を指します。ビデオでは、Assistance APIが直接画像を読むことができないため、画像からデータを抽出し、そのデータをAPIにフィードする一時的な方法を提案しています。

💡Zapier

Zapierはウェブサービスで、異なるウェブアプリケーションを自動的に連携させるためのプラットフォームです。ビデオでは、Zapierを使ってGoogle Driveから画像を取得し、それをAssistance APIに渡すプロセスを解説しています。

💡HTTP Callable

HTTP Callableは、ウェブサービスを呼び出すための機能です。ビデオでは、HTTP Callableを使用して画像から抽出されたデータをOpen AIのAPIに送信するプロセスについて説明しています。

💡画像解析

画像解析とは、画像から情報を抽出するプロセスです。ビデオでは、Open AIのGBT 4 Turboを使用して画像解析を行い、その結果をAssistance APIに渡してメールを生成する例を紹介しています。

💡カスタムモデル

カスタムモデルとは、特定のニーズやタスクに合わせてカスタマイズされたAIモデルです。ビデオでは、カスタムのAssistance APIを作成し、それを使用して特定のタスク(例えば請求書の解析)を自動化する方法について説明しています。

💡自動化

自動化とは、手動での操作を必要とせずにタスクを実行するプロセスです。ビデオでは、AIを活用して業務を自動化し、効率を上げることの重要性を強調しています。

Highlights

OpenAI released an update to GPT-4, specifically Turbo, with a new endpoint GPT-4 Turbo 2024, 049.

The new endpoint provides the ability to access vision capabilities.

A common question is how to leverage this new vision capability in the Assistant API.

The video will demonstrate both code and no-code approaches to utilize the vision capability.

A previous video discussed the cost associated with an app using this endpoint for calorie counting from images.

The current Assistant UI does not support image viewing, despite the new endpoint's capabilities.

A workaround is introduced to leverage the vision capability in the Assistant API.

The Assistant API is described as a more specialized version of the models, with additional functionalities.

A no-code method is presented using the Assistant API to handle invoice data.

Zapier is used to create a workflow that triggers on a new file in a Google Drive folder.

The workflow extracts data from an image and feeds it into the Assistant API.

A middleware block is used to reformat data before passing it to the Assistant API.

The video demonstrates how to extract invoice data from an image and use it to generate an email summary.

The process can be translated into a code-like manner using HTTP callable functions or databases.

It is hoped that future updates will make the integration of vision capabilities more native within the Assistant API.

The video concludes with a call to action to like the video and subscribe for more content on leveraging AI and automation.

Transcripts

play00:00

open AI released an update to gbt 4

play00:01

Turbo more specifically now we have the

play00:04

ability to access the vision capability

play00:06

on this new endpoint gbt 4 Turbo 2024

play00:10

049 now a major question I've been

play00:12

getting asked in the comments is okay

play00:14

this is super cool but how the heck do

play00:16

we leverage it in the assistance API so

play00:18

in today's video I'm going to show you

play00:20

how now to be clear this is a workaround

play00:22

so I'm going to give you a code way of

play00:23

approaching this logic but also I'm

play00:25

going to give you a no code way of

play00:27

approaching this logic let's jump in

play00:28

earlier this week I did an entire video

play00:30

on this tweet we got from openingi

play00:32

developers talking about this endpoint

play00:33

where specifically I ran a little

play00:35

scenario of like what the cost would be

play00:36

associated with this kind of app where

play00:38

you take a picture you get the calories

play00:40

so you can check out that video right

play00:41

there to get more context on that but

play00:42

after that video I got a ton of comments

play00:45

asking me how the heck do we leverage

play00:46

this in an assistant API first off let's

play00:48

see if it's even possible within the

play00:50

current assistant UI that we have access

play00:52

to to see an image using this endpoint

play00:55

so I'm going to go and name this image

play00:56

test and you're just going to have a

play00:58

simple instructions here we're going to

play00:59

say tell me what is in the

play01:03

image period we're going to go to the

play01:05

specific model that was released

play01:07

recently which is gb4 Turbo 2024

play01:10

049 and this should be sufficient we're

play01:12

going to go ahead and go to the

play01:12

playground in this playground in theory

play01:15

based on what we've been told in the

play01:17

documentation this endpoint should have

play01:19

the ability to read images Let's test it

play01:22

though I'm going to go and attach an

play01:23

image the image I attached is going to

play01:25

be a fake invoice here and we're going

play01:27

to Simply ask this endpoint what is this

play01:29

inv voice what is this image I attached

play01:34

hit run but then we get that kind of

play01:36

response sorry I can't currently view

play01:37

images however I can help you understand

play01:39

analyze text or descriptions of the

play01:40

images you provide them this is no good

play01:42

though as a lot of people want to

play01:44

leverage this assistant API in the

play01:46

context of reading images so in this

play01:48

video Let's see the possible workaround

play01:50

also I want to point out if you want to

play01:51

see more in- depth videos on just how to

play01:53

leverage assistance without this new

play01:54

update check out this video right here I

play01:56

show you how to manage like over 50

play01:58

files within assistant if you don't know

play02:00

what an assistant API is think of it as

play02:02

like an even lasered in version of these

play02:05

models but you can provide like you know

play02:07

different files functions code

play02:09

interpreter there's a bunch of like

play02:10

extra layering of the cake we can do

play02:12

with these kind of assistance apis to

play02:14

make it more specialized for my business

play02:16

I've done like three or four videos on

play02:17

this topic so either look through my

play02:18

Channel or type in Corbin assistance API

play02:21

therefore in order to leverage this kind

play02:22

of Technology let's go and create a new

play02:24

assistant let say create new assistant

play02:26

here and this assistant let's just say

play02:28

is going to be our invoice hand hander

play02:30

invoice Handler first way I'm going to

play02:32

show you is going to be a no code way we

play02:34

going to say invoice Handler we will

play02:36

provide you with invoice data I'm going

play02:40

to go to Upper model to gbt for Turbo

play02:42

the modern one we're going to say We'll

play02:43

provide you with invoice data based on

play02:46

this data please write an email

play02:50

summarizing it this is going to be

play02:52

really simple because I just want to

play02:53

show you how to do it obviously there's

play02:55

a ton of other stuff like I referenced

play02:56

earlier so if you want to see the other

play02:57

videos of adding files making this more

play02:58

laser in for a business adding content T

play03:00

adding all this other stuff go ahead and

play03:02

watch that we have our current assistant

play03:04

it's going to be invoice Handler we have

play03:06

our directions we have a model let's go

play03:07

and jump over to zapier in zapier let's

play03:09

go ahead and say create new zap now we

play03:11

have our new zap here let's go ahead and

play03:12

do a trigger of Google drive folder this

play03:15

is going to how we're going to provide

play03:17

the data that's going to be read and do

play03:19

an event of new file and folder we're

play03:21

going to do an account okay that should

play03:23

be

play03:24

fine do test data here perfect and let's

play03:27

go ahead and jump over to drive in our

play03:29

drive here we're going to goad and drag

play03:31

over that image I had earlier I like

play03:32

showing Drive in these kind of cases

play03:34

because of the fact that it's a free

play03:35

service so you can get access to it

play03:36

right away and you can test this

play03:38

yourself without paying anything we got

play03:40

our invoice paid PNG in here let's go

play03:42

ahead and first off basically find the

play03:44

data that's relevant in this invoice

play03:46

assistant API is our name here we're

play03:49

going to go ahead and test this trigger

play03:50

and we should see the relevant piece of

play03:52

data here which is invoice paid so

play03:54

basically every time we want to trigger

play03:55

this we would just drag in the file in

play03:57

that specific folder and then it would

play03:58

proceed everything that about to do

play04:00

today there we go we got our file here

play04:02

invoice paid PNG this is the workaround

play04:04

y'all we're going to do a Chad gbt block

play04:06

CH Chad gbt we're going to do an event

play04:08

of analyze image content with vision

play04:10

because here's the idea you have a

play04:13

lasered in an assistance API your

play04:14

assistance API is trained on your data

play04:16

is trained on everything you want it to

play04:18

be trained on you have it where it's

play04:20

very lasered in for the specific task

play04:21

you want it to do you can reference

play04:23

previous conversations with it etc etc

play04:27

but here's the one thing it can do read

play04:28

images but but we can still provide the

play04:31

information that would be necessary that

play04:33

we would feed the assistant API already

play04:36

EG if we were already going to do the

play04:38

assistance API in the context of reading

play04:40

an image what are we doing we're

play04:42

internalizing specific data from that

play04:44

image to feed it into the assistance API

play04:47

so a great workaround is let's just get

play04:50

the data that we would want anyways and

play04:52

then feed it into the assistance API

play04:53

with a midterm block this is actually a

play04:56

pretty common workflow in the context of

play04:57

building out software where you want to

play04:59

reformat data and pass it down to

play05:01

another open AI block or AI provider

play05:04

block in this context we're doing it spe

play05:07

specifically to feed it into the

play05:08

assistant API but a lot of times in

play05:10

software and development you could do

play05:12

this in other context where maybe you

play05:13

have a ton of data that you want to

play05:15

compress with a 3.5 block and then use a

play05:17

chat GPT 4 block of that compressed data

play05:19

and provide another Value Point that way

play05:21

so this is normal and this is standard

play05:24

we're going to apply to assistance API

play05:26

therefore let's go ahead and say

play05:26

continue we're going to choose our

play05:28

account continue

play05:30

we're going to do the message here we're

play05:31

going to just say provide all the

play05:33

relevant information from this invoice

play05:38

we'll do things such as build to build

play05:42

to comma

play05:45

descriptions

play05:48

comma total

play05:50

amount Etc so if you want to get very

play05:52

specific here I have a bunch of other

play05:53

stuff on this topic but the idea here is

play05:55

that I just want to show you let's grab

play05:57

the data and then play with it so we're

play05:58

going to expect basically the build to

play06:00

descriptions and see what else they see

play06:01

is relevant in this context we're going

play06:02

to continue here and test this step also

play06:04

the image is going to be file exists but

play06:05

not shown it's not going to be the title

play06:07

is that's a string of data just text

play06:10

this is the actual image itself and

play06:11

here's the relevant information so we

play06:12

got the bill to we got the from got the

play06:15

date got the description oh this went

play06:17

basically overboard so I said Etc so

play06:20

basically that took it as let's just

play06:21

provide all the relevant data found

play06:23

within that image to gut check this real

play06:24

quick let's find out build to Olivia

play06:26

Wilson build to Olivia Wilson final

play06:29

price is around 2,100 then we were

play06:31

looking at 2,100 perfect in four

play06:35

different

play06:36

Services revision social media templates

play06:38

local design graphic design all right

play06:41

that has all the relevant information

play06:42

that we want therefore from extracting

play06:44

this data from this image we can feed

play06:45

this into an assistant block so I'm

play06:47

going to go ahead and click this little

play06:48

plus button here and do another chat gbt

play06:51

do chat gbt here we do an events and

play06:53

we're going to do conversation with

play06:55

assistant and they continue here

play06:57

continue again and with the message

play06:59

since we've already proctored within our

play07:01

assistance dashboard what to do we

play07:04

actually don't have to worry too much

play07:05

here we can actually just provide the

play07:06

invoice data like this invoice data

play07:09

semicolon parenthesis and we should

play07:11

expect a response an email relevant to

play07:14

this information let's chose our

play07:15

assistant we got our insistant right

play07:17

here invoice Handler and everything else

play07:19

should be standard here everything else

play07:21

is kind of already preset in that

play07:22

dashboard that we saw earlier but

play07:23

continue here and I should be expecting

play07:25

an email or like a pseudo email or a

play07:28

draft email here we go so this is the

play07:31

data we fed to it I like using invoice

play07:33

um I like referencing the Val variable

play07:35

point because of the fact that we said

play07:36

that earlier here we said the invoice

play07:38

data here so we're going to reference

play07:40

that with what we provide here then we

play07:41

do parentheses in order to basically

play07:43

compartmentalize the data that we're

play07:44

providing here and then we're going to

play07:45

come down here and we should see an

play07:47

email and boom subject line invoice

play07:49

summary invoice 200015 from warier in

play07:53

here l Wilson we confirmed that earlier

play07:55

and we should see all the relevant

play07:57

information in our invoice 2100 and all

play07:59

the relevant information in regards to

play08:01

that image therefore the logic you just

play08:03

saw here can be translated in a codel

play08:06

likee manner all you need to understand

play08:08

in that context is we're going to be

play08:09

passing data around within an HTTP

play08:11

callable function or whatever you know

play08:13

maybe you want to use a Firebase

play08:15

database flow it might be a little bit

play08:16

more C heavy but the idea there

play08:18

basically is that you call that endpoint

play08:20

in the first one the first function

play08:21

right here and you you know say your max

play08:22

token set your temperature have your

play08:24

prompt provide the relevant data point

play08:26

of the image asking for an output and

play08:29

then the second part of that logic would

play08:31

be like the output from that first

play08:32

function or from that first callable

play08:34

point you would send it to the

play08:36

assistance API in the code and then

play08:37

you'd get your relevant information

play08:39

slata that you care about this is pretty

play08:42

common in the context of software

play08:43

development and just workflow in general

play08:45

where you kind of want to maybe you know

play08:47

restructure the data a little bit before

play08:49

you send it to a further workflow or

play08:51

further Chad gbt block or anthropic

play08:53

block or whatever you're using in that

play08:55

context what I will say is that probably

play08:57

in the future let's hope at least

play08:59

when referencing the assistant API

play09:01

they'll make it more native and

play09:03

integrated so you don't have to

play09:04

necessarily do that little middle block

play09:06

between if you feel like you learned

play09:08

something today's video make sure to

play09:09

leave a like it's completely free we'll

play09:10

leave another assistance video I did a

play09:12

couple weeks ago and showing you how to

play09:13

leverage it for your business at the end

play09:15

here I'm also going to leave a playlist

play09:16

showing you how to leverage Ai and

play09:19

automation this is the playlist I was

play09:21

referring to when it comes to Ai and

play09:22

Automation and how to leverage on your

play09:23

business that is an assistance video

play09:25

more specifically an assistance API

play09:26

video you can start learning how to

play09:28

leverage this kind of tech easier here

play09:29

and that's my face daily content

play09:32

subscribe bye-bye

Rate This

5.0 / 5 (0 votes)

Related Tags
AIアップデート画像認識アシスタントAPIワークアラウンドコードレスZapierGoogleドライブインボイス解析自動化ビジネス
Do you need a summary in English?