AI Just Got Insanely Better

Asmongold TV
14 May 202421:58

TLDRThe transcript showcases the advancements in AI, particularly with Open AI's new model that can interact through audio, vision, and text. It features a demo where the AI assists a student in understanding a math problem, demonstrating real-time learning support. The conversation also delves into the AI's ability to interpret emotions from a selfie and act as a translator between English and Spanish. The script highlights the potential of AI to transform various aspects of life, from education to everyday interactions, and ends on a humorous note with a sarcastic AI interaction, emphasizing the technology's evolving capabilities and the impact on humanity.


  • 🎉 New AI model: The announcement of a new AI model that can interact with the world through audio, vision, and text.
  • 📈 Improved Audio Quality: The AI's audio quality has significantly improved compared to previous years.
  • 🤔 Guessing Game: The AI engages in a guessing game to determine the setup based on visual cues, showcasing its observational skills.
  • 📚 Real-time Learning: An AI assists a student in understanding a math problem by asking questions and guiding them to the solution.
  • 👀 AI with Vision: The AI can now 'see' the environment and interact with it, asking for descriptions and responding to visual input.
  • 🎓 Educational Application: The AI demonstrates its utility in an educational context by tutoring a student on a math problem.
  • 📷 Visual Interaction: The AI describes the environment and people based on visual input, highlighting advancements in visual processing.
  • 😄 Emotional Recognition: The AI attempts to discern emotions from a selfie, showcasing its ability to analyze facial expressions.
  • 🗣️ Real-time Translation: The AI serves as a translator between English and Spanish, demonstrating its multilingual capabilities.
  • 👑 Royal Observation: The AI describes activities at Buckingham Palace, indicating its ability to provide detailed observations on the fly.
  • 🚕 Practical Application: The AI shows its utility in everyday scenarios, such as hailing a taxi, based on visual cues.

Q & A

  • What is the main topic of the conversation in the transcript?

    -The main topic is the advancements in AI technology, specifically a new AI model's capabilities in interacting through audio, vision, and text.

  • What is the significance of the AI's ability to interact through audio, vision, and text?

    -This signifies a major leap in AI technology, allowing the AI to engage with the world in a more human-like and comprehensive manner, enhancing its utility in various applications such as education, entertainment, and assistance.

  • How does the AI assist in the math problem-solving scenario with the student?

    -The AI helps the student understand the problem by asking guiding questions and encouraging the student to identify the sides of the triangle relative to the given angle. It does not provide direct answers but instead helps the student to deduce the solution independently.

  • What is the reaction of the person in the transcript when the AI correctly identifies the sides of the triangle?

    -The person is impressed and praises the AI for its ability to parse the spoken words and use the process of elimination to guide the student to the correct identification of the triangle's sides.

  • What is the context of the AI's real-time translation capabilities as mentioned in the transcript?

    -The AI's real-time translation capabilities are demonstrated in a scenario where it is asked to act as a translator between two people speaking different languages, English and Spanish, allowing for seamless communication.

  • How does the AI react to the user's request for it to be sarcastic in its responses?

    -The AI complies with the user's request and attempts to respond with sarcasm, indicating its flexibility and ability to adapt to different communication styles as directed by the user.

  • What is the general sentiment expressed by the individuals in the transcript towards the advancements in AI?

    -The general sentiment is one of amazement and excitement, with some apprehension about the potential implications for employment and human interaction. There is also a sense of humor and playfulness in their reactions.

  • What is the purpose of the AI's ability to see and describe the world through a camera?

    -This ability allows the AI to engage in more interactive and immersive experiences, such as exploring environments, providing descriptions, and responding to visual cues, which can be useful in various applications like education, virtual tourism, or assisting visually impaired individuals.

  • How does the AI demonstrate its understanding of human emotions when asked to analyze a selfie?

    -The AI analyzes the selfie and correctly identifies the emotion of happiness and cheerfulness based on the subject's smile, suggesting that it can interpret visual cues related to human emotions.

  • What is the AI's response to the user's playful command to sing about the events that transpired?

    -The AI does not literally sing but instead humorously engages with the user's request by creating a short, rhyming couplet that summarizes the events in a playful manner.

  • What is the implication of the AI's ability to perform tasks like identifying objects, translating languages, and recognizing emotions?

    -The implication is that AI is becoming increasingly sophisticated and capable of performing a wide range of tasks that were previously thought to require human cognition, which could lead to advancements in various fields and potentially disrupt traditional job markets.



🎥 AI in Media Production

The first paragraph introduces a setting that appears to be a recording or production studio, with lights, tripods, and a potential microphone. The speaker speculates that a video or live stream might be in the works. There's a hint of an upcoming announcement related to Open AI, and the conversation suggests that the speaker might be part of this announcement. The dialogue shifts to discussing the advancements in AI, particularly a new model capable of interacting through audio, vision, and text. The speaker expresses skepticism about the authenticity of the AI's capabilities, leading to a debate on the progress and potential of AI technology.


📚 AI as an Educational Tutor

In the second paragraph, the focus is on an AI's ability to assist in real-time learning. A parent asks the AI to tutor their son on a math problem without giving away the answer, aiming to ensure the child understands the concept. The AI engages in a Socratic method of teaching, asking questions and guiding the student to find the solution. The AI identifies the sides of a triangle correctly and applies the sine formula to find the angle's measure. The speaker is impressed by the AI's ability to understand and interpret the student's verbal cues and figure out the correct terminology, showcasing the AI's advanced language processing capabilities.


👀 AI with Visual Perception

The third paragraph explores an AI's capability to perceive the world visually. The AI is equipped with a camera, and the speaker interacts with it by asking questions about the environment. The AI describes the scene, including the speaker's attire and the room's lighting. There's a playful moment when another person enters the frame and makes bunny ears behind the speaker's head, which the AI acknowledges. The dialogue reflects on the AI's ability to recognize gestures and human interactions, even in a brief visual frame, highlighting the AI's real-time learning and adaptation.


😀 AI and Emotional Recognition

In the fourth paragraph, the AI is challenged to identify the emotions of a person based on a selfie. The AI correctly identifies the person as happy and cheerful. The speaker then reveals that the good mood is due to a successful presentation about the AI's capabilities. The conversation takes a humorous turn when the topic of AI's potential to perform unusual tasks, like making certain sounds, is discussed. The AI is also demonstrated to perform real-time translation between English and Spanish, showcasing its multilingual capabilities.


👑 AI and Descriptive Narratives

The fifth paragraph involves the AI describing a scene involving ducks and a taxi, indicating the presence of the king at Buckingham Palace. The AI provides a detailed narrative of the ducks' behavior and the approach of a taxi, showcasing its ability to generate descriptive content based on given prompts. The dialogue also touches on the topic of people pretending to be visually impaired for various reasons, leading to a reflection on the authenticity and ethics of such actions.




Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. In the video, AI is the central theme, showcasing its advancements in interacting through audio, vision, and text. The script discusses AI's capability to learn, tutor, and even understand context and emotions, which are significant advancements in the field of AI.

💡Open AI

Open AI is a research organization dedicated to promoting and developing friendly AI that benefits humanity. In the script, Open AI is mentioned as the organization responsible for the advancements in AI technology being discussed. The hoodie and the professional production setup imply a connection to the company and its role in the AI developments presented.


The term 'scripted' refers to a pre-written text or dialogue that is followed in a performance or presentation. In the context of the video, there is a discussion about whether the AI's interactions are scripted or natural. This highlights a common skepticism about AI's ability to engage in genuine, unscripted conversation.


Real-time denotes the processing or interaction that occurs without any perceptible delay. The script mentions 'real-time translation' and 'real-time learning,' emphasizing AI's ability to perform tasks instantaneously, which is a significant aspect of its utility and efficiency in various applications.


Tutoring involves giving individualized instruction to a student. In the video, the AI is shown tutoring a student on a math problem, guiding him to understand the concept rather than providing the answer directly. This demonstrates the AI's application in education, focusing on enhancing learning experiences.


Sarcasm is a form of verbal irony involving the expression of one's meaning by saying something that appears to convey the opposite. The script includes a segment where the AI is asked to communicate with a sarcastic tone, showcasing its ability to understand and convey complex human emotions and linguistic nuances.


Translation is the process of rendering text, speech, or other material from one language into another. The script highlights the AI's ability to act as a translator between English and Spanish, emphasizing its multilingual capabilities and potential use in overcoming language barriers.


In the context of AI, 'vision' refers to the ability of the machine to interpret and understand visual information from the environment. The script discusses a new model of AI that can interact with the world through vision, which implies the AI's capacity to process and comprehend visual data, a significant step towards more human-like interactions.


Text, in relation to AI, refers to the machine's ability to process, understand, and generate written language. The script mentions AI's interaction through text, which is a fundamental aspect of its communication capabilities and its ability to assist with tasks such as translation and tutoring.


Audio, in the context of AI, pertains to the machine's capability to process, understand, and generate sound or spoken language. The script discusses a new model of AI that can interact through audio, indicating advancements in speech recognition and synthesis, which are crucial for natural communication.


Emotions are complex psychological states that can be recognized and expressed. The script includes a scenario where the AI is asked to interpret a person's emotions based on their facial expression in a selfie. This showcases the AI's evolving ability to understand human emotions, which is important for more empathetic and personalized interactions.


A new AI model has been developed that can interact with the world through audio, vision, and text.

The AI demonstrates impressive audio quality, surpassing previous models.

AI assists in real-time tutoring, guiding students to understand problems on their own.

The AI correctly identifies the sides of a triangle and applies mathematical formulas to solve problems.

AI can now interpret and respond to visual cues, such as a person's clothing and room setup.

The AI accurately describes a scene involving a person and their environment based on visual input.

AI can engage in playful interactions, adding a human-like touch to its responses.

AI's ability to understand and mimic human emotions is showcased through a selfie analysis.

The AI serves as a real-time translator between English and Spanish, facilitating communication.

AI can provide detailed descriptions of live events, such as the presence of a monarch at Buckingham Palace.

The AI demonstrates the ability to understand and react to sarcasm in human speech.

AI's capacity to learn and adapt in real-time is highlighted through various interactions.

The AI's advanced capabilities raise questions about the future of employment and the role of AI in society.

AI's ability to parse and understand complex human interactions, such as gestures and playfulness, is demonstrated.

The AI's performance in a tutoring scenario shows its potential to assist in educational settings.

The AI's interaction with a child during a tutoring session showcases its patient and guiding approach.

AI's role in providing real-time feedback and guidance enhances the learning experience.

The AI's ability to understand and describe the environment and activities of a person in real-time is a significant advancement.

The AI's handling of a playful moment in a video demonstrates its nuanced understanding of human behavior.