Getting Started with GPT-4o API, Image Understanding, Function Calling and MORE

Prompt Engineering
14 May 202416:50

TLDRThis video tutorial introduces viewers to the GPT-4.0 API, contrasting it with the GPT-4.0 Turbo model. It covers the capabilities of both models, including text and image processing, and discusses the cost-effectiveness of GPT-4.0. The video demonstrates how to use the OpenAI Playground for experimenting with GPT-4.0 and how to integrate the API into Python projects using a Google Colab notebook. The host also explores the model's JSON response mode, image understanding features, and function calling abilities with practical examples. The tutorial concludes with a mention of upcoming features like voice input/output and a teaser for a future video comparing coding abilities of GPT-4.0 and GPT-4.0 Turbo.


  • 📈 **GPT-4.0 vs. GPT-4.0 Turbo Comparison**: GPT-4.0 can process text and images, while GPT-4.0 Turbo also supports voice input and output. GPT-4.0 is set to add voice features soon.
  • 💬 **Text and Image Inputs**: Both models accept text and image inputs, with GPT-4.0 Turbo having voice capabilities and GPT-4.0 planning to add them.
  • 💰 **Cost Consideration**: GPT-4.0 is priced at half the cost of GPT-4.0 Turbo.
  • 🔍 **Context Window**: Both models have a context window of 128,000 tokens.
  • 🚀 **OpenAI Playground Experimentation**: Users can experiment with GPT-4.0 in the OpenAI Playground and later implement similar functionalities in a Google Colab notebook.
  • ⏱️ **Processing Speed**: GPT-4.0 demonstrates faster processing and text generation compared to GPT-4.0 Turbo.
  • 📊 **Detailed Responses**: GPT-4.0 provides more detailed responses, especially with information regarding percentages.
  • 🛠️ **API Usage in Python**: The script outlines how to use the GPT-4.0 API within Python code, including creating a chat completion client and handling JSON responses.
  • 📈 **JSON Mode Capability**: GPT-4.0 can generate responses in JSON format, allowing for structured data output.
  • 📷 **Image Understanding**: GPT-4.0 has the ability to understand and process images, either by uploading or using image URLs.
  • 📚 **Function Calling**: The model can call functions to retrieve information when needed, such as getting NBA game scores.
  • 📝 **Markdown Responses**: GPT-4.0 can respond in Markdown, aiding in formatting responses for tasks like math homework or describing images.

Q & A

  • What are the key differences between GPT 4.0 and GPT 4.0 Turbo in terms of capabilities?

    -GPT 4.0 and GPT 4.0 Turbo both can process text and images as input, and currently, their output is only text. GPT 4.0 Turbo supports voice input and output, which GPT 4.0 will also support in the coming weeks. They both have a context window of 128,000 tokens, but GPT 4.0 costs half as much as GPT 4.0 Turbo.

  • How does the processing speed of GPT 4.0 compare to GPT 4.0 Turbo?

    -The processing speed of GPT 4.0 is significantly faster than that of GPT 4.0 Turbo. The latency numbers show that GPT 4.0 is almost half as fast as GPT 4.0 Turbo in generating responses.

  • What is the context window for both GPT 4.0 and GPT 4.0 Turbo models?

    -The context window for both GPT 4.0 and GPT 4.0 Turbo models is 128,000 tokens.

  • How can one experiment with GPT 4.0 in the OpenAI Playground?

    -In the OpenAI Playground, one can experiment with GPT 4.0 by selecting the model from the list of models and then setting the system prompt and other parameters like temperature and max tokens. Users can also add an image and ask GPT 4.0 to process that image.

  • How does GPT 4.0 handle image understanding tasks?

    -GPT 4.0 can process images either by uploading them directly or by providing an image URL. It can identify and describe the content of images, including recognizing objects, colors, and even emotions shown in pictures.

  • What is the process for using GPT 4.0 within Python code?

    -To use GPT 4.0 within Python code, one needs to install and upgrade the OpenAI packages, import necessary modules like JSON, and then import the OpenAI client. An API key is required, which can be set as an environment variable or stored in notebook secrets. A chat completion client is created, and the model can be used to generate responses to prompts.

  • How does GPT 4.0 respond to a JSON mode request?

    -When GPT 4.0 is set to JSON mode, it returns responses in a valid JSON format. For example, when asked to create a weekly workout routine, it returns a JSON object containing the routine.

  • What is the function calling ability in GPT 4.0?

    -Function calling in GPT 4.0 allows the model to use external tools or functions to retrieve information needed to answer a query. The model decides whether to use an external tool based on the user's query, calls the tool if necessary, and then generates a response using the tool's output and the original query.

  • How does GPT 4.0 handle voice input and output?

    -As of the time of the script, GPT 4.0 does not support voice input and output. However, this feature is planned to be added in the coming weeks.

  • Can GPT 4.0 process videos?

    -GPT 4.0 cannot process videos directly. However, videos can be converted into frames, and each frame or image can be processed in a sequence by the model.

  • What is the training cutoff date for GPT 4.0?

    -According to the model's self-response, its training data includes information up to September 2021. However, this information may not be accurate as the training of GPT 4.0 was likely completed sometime in 2022.

  • How can one subscribe to more content around GPT 4.0?

    -To receive more content about GPT 4.0, one can subscribe to the channel where the tutorial was found, ensuring they are notified of new content releases.



🚀 Introduction to GPT 4.0 API and Comparison with GPT 4.0 Turbo

The video begins with an introduction to the GPT 4.0 API and its comparison to the GPT 4.0 Turbo model. The presenter outlines the capabilities of both models, including text and image processing, and mentions that voice input/output is supported by GPT 4 Turbo and will be added to GPT 4.0 soon. The context window for both is 128,000 tokens, and GPT 4.0 is noted to be more cost-effective. The OpenAI Playground is introduced as a platform for experimentation, and the process of setting up a system prompt and adjusting parameters like temperature and token generation is demonstrated. The video also includes a live example of image processing using GPT 4.0.


📚 Using GPT 4.0 for Math and JSON Mode

The presenter explores using GPT 4.0 for simple math problems as a test of the API's functionality. They also inquire about the AI's identity and training data, receiving information that the model identifies as GPT 4 and was trained with data up to September 2021. The video then demonstrates how to use GPT 4.0 in JSON mode to generate a weekly workout routine. Additionally, the presenter discusses the model's image understanding capabilities using an example image of a triangle and shows how to process images using base64 encoding.


📈 Evaluating Image Understanding and Function Calling

The video continues with an evaluation of GPT 4.0's image understanding abilities, demonstrating how it can accurately interpret and describe various elements within an image, including bar charts and emotions. The presenter also tests the model's function calling capabilities using mock NBA game score data. They explain the process of function calling within the model, where the model decides whether to use an external tool based on the user query, and if necessary, calls the appropriate function, retrieves the results, and generates a response incorporating this information.


🏆 Final Thoughts on GPT 4.0 and Future Tutorials

The presenter concludes the tutorial by summarizing the key points covered in the video, including the use of GPT 4.0 for text and image processing, JSON responses, and function calling. They note that voice input/output and video input processing were not covered but express willingness to create additional tutorials on these topics if there is interest from the audience. The video ends with a call to action for viewers to subscribe for more content on GPT 4.0 and related technologies.



💡GPT 4.0 API

GPT 4.0 API refers to the application programming interface provided by OpenAI for developers to integrate the capabilities of the GPT 4.0 model into their own applications. In the video, it is used to demonstrate how to interact with the model using the OpenAI Playground and a Google Colab notebook, showcasing text generation and image understanding functionalities.

💡Image Understanding

Image Understanding is the ability of an AI model to process and interpret visual data, such as images. In the context of the video, GPT 4.0 is shown to analyze and describe the content of images, including recognizing objects, colors, and even emotions in a bar chart and a person's facial expression.

💡Function Calling

Function Calling is a feature that allows the AI model to execute predefined functions or tools to retrieve information or perform tasks. In the video, it is demonstrated by creating a function that retrieves NBA game scores based on a team's name mentioned in a user's query, showcasing the model's ability to interact with external tools to provide responses.

💡OpenAI Playground

OpenAI Playground is an interactive platform provided by OpenAI where users can experiment with their models, such as GPT 4.0, without writing any code. The video uses the Playground to illustrate how to input text and images, set parameters like temperature, and generate responses from the model.

💡Google Colab Notebook

A Google Colab Notebook is a cloud-based interactive computing environment that allows users to write and execute Python code. In the video, it is used to demonstrate how to use the GPT 4.0 API within a Python codebase, including setting up the API key, creating a chat completion client, and processing images.

💡JSON Mode

JSON Mode refers to the format in which the AI model's response is structured. JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write, and easy for machines to parse and generate. In the video, the model is instructed to respond in JSON format, which is then used to create a weekly workout routine.

💡Base 64 Processing

Base 64 Processing is a method of encoding binary data into text format using a specific set of 64 characters. In the context of the video, it is used to encode an image file into a format that can be processed by the GPT 4.0 model, allowing the model to understand and respond to the content of the image.

💡Context Window

The Context Window refers to the amount of context or information that an AI model can take into account when generating a response. In the video, it is mentioned that both GPT 4.0 and GPT 4.0 Turbo have a context window of 128,000 tokens, which is a significant amount that allows the model to process and understand extensive inputs.


In the context of AI models, Temperature is a parameter that controls the randomness of the model's output. A lower temperature results in more conservative, predictable responses, while a higher temperature allows for more varied and creative outputs. In the video, the system prompt for GPT 4.0 is set with a lower temperature to ensure more consistent results.

💡Max Number of Tokens

The Max Number of Tokens is a parameter that specifies the maximum length of the response that the AI model can generate. In the video, this parameter is adjusted to allow the model to generate longer responses when needed, such as when describing images or generating a detailed workout routine.

💡Stop Sequences

Stop Sequences are specific strings or tokens that, when encountered by the AI model, signal the end of the generated response. In the video, it is mentioned that no stop sequences are added, allowing the model to generate responses up to the maximum number of tokens without being prematurely interrupted.


Introduction to getting started with GPT 4.0 API and its comparison with GPT 4.0 Turbo.

Both models can process text and images as input, with GPT 4.0 Turbo supporting voice input and output.

GPT 4.0 has a context window of 128,000 tokens and is priced at half the cost of GPT 4.0 Turbo.

Demonstration of using the OpenAI Playground to experiment with GPT 4.0.

Setting up the system prompt and adjusting parameters for text generation.

Adding an image for GPT 4.0 to process and explaining the image.

Real-time processing speed of GPT 4.0 and its comparison with GPT 4.0 Turbo.

GPT 4.0's response is more detailed, providing more information regarding percentages.

Upcoming detailed comparison video between GPT 4.0 and GPT 4.0 Turbo focusing on coding abilities.

Integration of GPT 4.0 API within Python code using the OpenAI client.

Creating a chat completion client and testing the API with a simple math question.

Using JSON mode to generate a weekly workout routine.

Demonstrating the model's image understanding abilities with an example of a triangle image.

Base64 processing of images for input into GPT 4.0.

Using image URLs directly for processing in GPT 4.0.

Describing emotions shown in an image of a man with a beard, wearing a green shirt and smiling.

Exploring function calling abilities with mock data for NBA game scores.

Explanation of how function calling works in general with user queries and external tools.

Creating a function calling agent to retrieve NBA game scores based on the team mentioned in the user prompt.

Final response generation using function calling with the example of the Lakers game score.

Note on the current unavailability of voice input and output, and the inability to process videos directly.

Invitation to subscribe for more content around GPT 4.0 and viewer engagement for topic suggestions.