What are Generative AI models?

IBM Technology
22 Mar 202308:47

Summary

TLDRIn this video, Kate Soule, a senior manager of business strategy at IBM Research, explains the emergence and potential of large language models (LLMs) and foundation models in AI. She discusses how these models, trained on vast amounts of data, can perform a wide range of tasks with high performance and productivity gains. Despite their advantages, challenges like high compute costs and trustworthiness issues persist. IBM is innovating to improve these models' efficiency and reliability across various domains, including language, vision, code, chemistry, and climate change. Soule encourages viewers to explore IBM's efforts to enhance AI technologies.

Takeaways

  • ๐ŸŒ Large language models (LLMs) like chatGPT are revolutionizing AI capabilities, demonstrating significant advancements in performance and enterprise value.
  • ๐Ÿ“š LLMs are part of a broader category known as foundation models, which represent a paradigm shift in AI from task-specific models to more versatile foundational capabilities.
  • ๐Ÿ” The term 'foundation models' was introduced by Stanford researchers to describe AI models that can be applied to a wide range of tasks beyond their initial training.
  • ๐Ÿ“ˆ Foundation models are trained on vast amounts of unstructured data, enabling them to predict and generate text based on previous inputs, a feature that falls under the umbrella of generative AI.
  • ๐Ÿ”ง These models can be fine-tuned for specific tasks by introducing a small amount of labeled data, a process known as tuning, which enhances their versatility.
  • ๐Ÿ”Ž Foundation models can also be used in low-labeled data scenarios through prompting or prompt engineering, where they infer tasks from given cues or questions.
  • ๐Ÿ† The primary advantage of foundation models is their superior performance, stemming from their extensive training on terabytes of data, which allows them to outperform models trained on limited data.
  • ๐Ÿš€ Foundation models offer significant productivity gains by reducing the need for extensive labeled data, leveraging their pre-training on unlabeled data for various tasks.
  • ๐Ÿ’ฐ A major disadvantage is the high computational cost associated with training and running these models, which can be prohibitive for smaller enterprises.
  • ๐Ÿค” Trustworthiness is another concern, as these models are often trained on unvetted internet data, potentially leading to biases and inclusion of inappropriate content.
  • ๐ŸŒณ IBM is actively innovating to improve the efficiency and trustworthiness of foundation models, aiming to make them more viable for business applications.
  • ๐Ÿ–ผ๏ธ Foundation models are not limited to language; they are also being developed for vision (e.g., DALL-E 2), coding (e.g., Copilot), and other domains like chemistry and climate science.

Q & A

  • What is the significance of large language models (LLMs) in the context of AI advancements?

    -Large language models (LLMs) have revolutionized AI by demonstrating a significant leap in performance across various tasks such as writing poetry and planning vacations. They represent a shift in AI capabilities, potentially driving substantial enterprise value.

  • What are foundation models and how do they differ from traditional AI models?

    -Foundation models are a new class of AI models that serve as a foundational capability for multiple applications. Unlike traditional AI models that are trained for specific tasks with task-specific data, foundation models are trained on a vast amount of unstructured data, enabling them to be transferred to various tasks.

  • How did the term 'foundation models' originate?

    -The term 'foundation models' was first coined by a team from Stanford University. They observed a paradigm shift in AI where the field was converging towards models that could serve as a base for various applications.

  • What is the process of training a foundation model?

    -Foundation models are trained in an unsupervised manner on unstructured data. For instance, in the language domain, they are fed with terabytes of data, learning to predict the next word in a sentence based on the preceding words.

  • Why are foundation models considered part of generative AI?

    -Foundation models are part of generative AI because they have the capability to generate something new, such as the next word in a sentence, based on the words they have seen before.

  • How can foundation models be tuned to perform specific NLP tasks?

    -Foundation models can be tuned by introducing a small amount of labeled data. This process adjusts the model's parameters, allowing it to perform specific natural language tasks like classification or named-entity recognition.

  • What is the concept of prompting or prompt engineering in the context of foundation models?

    -Prompting or prompt engineering is a method where foundation models are applied to tasks by providing them with a prompt or a question. The model then generates a response based on the prompt, which can be used for tasks like sentiment analysis.

  • What are the advantages of using foundation models in business settings?

    -The advantages of foundation models include superior performance due to extensive data exposure and significant productivity gains. They require less labeled data for task-specific training compared to models trained from scratch.

  • What are some of the disadvantages associated with foundation models?

    -Disadvantages of foundation models include high compute costs due to their extensive training and inference requirements, and issues with trustworthiness, as they are trained on unstructured data that may contain biases or toxic information.

  • How is IBM contributing to the development and improvement of foundation models?

    -IBM is working on innovations to improve the efficiency and trustworthiness of foundation models, making them more relevant for business applications. They are also exploring the application of foundation models in various domains beyond language, such as vision, code, chemistry, and climate change.

  • What are some examples of foundation models being applied in different domains?

    -Examples include DALL-E 2 for vision, generating custom images from text data, Copilot for code completion, and IBM's molformer for molecule discovery. IBM is also innovating in areas like Earth Science Foundation models for climate research.

Outlines

00:00

๐Ÿค– Introduction to Foundation Models in AI

Kate Soule, a senior manager of business strategy at IBM Research, introduces the concept of large language models (LLMs) and their impact on various tasks such as writing poetry or planning vacations. She explains that LLMs are part of a broader category called foundation models, which are a new paradigm in AI. Foundation models are trained on vast amounts of unstructured data, enabling them to perform multiple tasks through a process called tuning. This involves introducing a small amount of labeled data to adapt the model to specific tasks. The generative capability of these models, which allows them to predict and generate the next word in a sentence, is a key feature of their versatility. Despite their advantages, foundation models also have drawbacks, such as high computational costs and potential issues with trustworthiness due to the unstructured nature of the data they are trained on.

05:05

๐Ÿ’ป Challenges and Innovations in Foundation Models

The second paragraph delves into the challenges associated with foundation models, such as the high computational cost of training and running these models, which can be prohibitive for smaller enterprises. Additionally, the vast amount of unstructured data used in training can lead to issues with bias, hate speech, and other toxic information. Kate highlights IBM's efforts to address these challenges by improving the efficiency and trustworthiness of these models. She also discusses the application of foundation models beyond language, including vision models like DALL-E 2, code models like Copilot, and models for chemistry and climate change. IBM is actively innovating in these areas, integrating foundation models into products like Watson Assistant, Watson Discovery, Maximo Visual Inspection, and collaborating with Red Hat on Project Wisdom. The paragraph concludes with an invitation to learn more about IBM's work in enhancing foundation models.

Mindmap

Keywords

๐Ÿ’กLarge Language Models (LLMs)

Large Language Models, or LLMs, refer to AI systems that are trained on vast amounts of text data and can perform complex language tasks. In the video, LLMs like chatGPT are highlighted for their ability to revolutionize various applications, from writing poetry to planning vacations. They are part of a broader class of models called foundation models, which are trained on unstructured data and can be adapted to perform multiple tasks.

๐Ÿ’กFoundation Models

Foundation models are a class of AI models that serve as a foundational capability for various applications. The term was first coined by Stanford researchers to describe a shift in AI paradigms. Unlike task-specific AI models, foundation models are trained on large datasets and can be tuned or prompted to perform different tasks. In the video, the speaker explains that these models can drive enterprise value by being adaptable and efficient in handling multiple tasks.

๐Ÿ’กGenerative AI

Generative AI is a field of AI that involves creating new content, such as text, images, or music, based on existing data. In the context of the video, foundation models are part of generative AI because they can predict and generate the next word in a sentence. This capability is crucial for tasks like language translation, text completion, and even content creation.

๐Ÿ’กUnsupervised Learning

Unsupervised learning is a type of machine learning where the model learns from unlabeled data. In the video, it is mentioned that foundation models are trained in an unsupervised manner on unstructured data, such as terabytes of text. This training allows the models to develop a deep understanding of language patterns and relationships, which they can then apply to various tasks.

๐Ÿ’กTuning

Tuning in the context of AI refers to the process of adjusting a model's parameters to better perform a specific task. The video explains that by introducing a small amount of labeled data, foundation models can be tuned to perform traditional NLP tasks like classification or named-entity recognition. This process allows the models to adapt to specific tasks without extensive retraining.

๐Ÿ’กPrompt Engineering

Prompt engineering is a technique used in AI where the model is given a prompt or a question to guide its response. In the video, the speaker uses the example of asking a model to classify the sentiment of a sentence by prompting it with a question. This method allows foundation models to perform tasks in low-labeled data domains, leveraging their generative capabilities.

๐Ÿ’กPerformance

In the video, performance refers to the effectiveness and efficiency of AI models, particularly foundation models. The speaker highlights that these models, due to their extensive training on large datasets, can drastically outperform models trained on limited data. This superior performance is a key advantage of foundation models in driving enterprise value.

๐Ÿ’กProductivity Gains

Productivity gains in the context of AI models refer to the efficiency improvements achieved by using these models. The video discusses how foundation models, through tuning or prompting, require less labeled data to perform specific tasks compared to starting from scratch. This reduces the time and resources needed for training, thereby increasing productivity.

๐Ÿ’กCompute Cost

Compute cost refers to the expenses associated with training and running AI models, particularly large and complex ones like foundation models. The video mentions that these models are expensive to train due to the vast amounts of data they process and require significant computational resources, such as multiple GPUs, for inference. This can be a barrier for smaller enterprises.

๐Ÿ’กTrustworthiness

Trustworthiness in AI models pertains to their reliability and the absence of biases or harmful content. The video discusses the challenges of ensuring that foundation models, trained on unstructured and potentially biased internet data, are trustworthy. The speaker mentions that the vast amount of data used in training can include toxic information, which needs to be addressed to make these models suitable for business settings.

๐Ÿ’กDomain Applications

Domain applications refer to the specific areas or fields where AI models can be applied. The video mentions various domains, such as language, vision, code, chemistry, and climate change, where foundation models can be used. Examples include language models in Watson Assistant, vision models in Maximo Visual Inspection, and chemistry models like molformer for molecule discovery.

Highlights

Large language models, or LLMs, like chatGPT, have significantly advanced AI's capabilities, offering uses from writing poetry to planning vacations.

Kate Soule, a senior manager of business strategy at IBM Research, provides an overview of AI's potential to drive enterprise value.

Foundation models, a class of models that includes LLMs, represent a new AI paradigm, moving away from task-specific models to versatile models that can handle multiple tasks.

The term 'foundation models' was coined by a team from Stanford, highlighting a shift towards models trained on vast amounts of unstructured data.

These models predict the next word in a sentence by being trained on terabytes of data, showcasing their generative capabilities.

Foundation models, part of generative AI, can be fine-tuned with a small amount of labeled data to perform specific natural language tasks like classification and named-entity recognition.

Prompt engineering allows foundation models to perform tasks with minimal labeled data by prompting them with specific questions or tasks.

Advantages of foundation models include high performance due to extensive data exposure and significant productivity gains from requiring less labeled data for task-specific models.

Disadvantages include high compute costs for training and running inference, making them less accessible for smaller enterprises.

Trustworthiness issues arise from training on vast amounts of unstructured data, which may contain bias or toxic information.

IBM is working on innovations to improve the efficiency and trustworthiness of foundation models, making them more suitable for business applications.

Foundation models are not limited to language; they also include vision models like DALL-E 2, code models like Copilot, and models for other domains like chemistry and climate research.

IBM integrates language models into products like Watson Assistant and Watson Discovery, vision models into Maximo Visual Inspection, and code models into Project Wisdom with Red Hat.

IBM's Molformer, a foundation model for molecule discovery, showcases applications in targeted therapeutics and chemistry.

IBM is developing Earth Science Foundation models using geospatial data to advance climate research.

Transcripts

play00:00

Over the past couple of months, large language models, or LLMs, such as chatGPT, have taken the world by storm.

play00:08

Whether it's writing poetry or helping plan your upcoming vacation, we are seeing a step change in the performance of AI and its potential to drive enterprise value.

play00:19

My name is Kate Soule.

play00:21

I'm a senior manager of business strategy at IBM Research,

play00:24

and today I'm going to give a brief overview of this new field of AI that's emerging and how it can be used in a business setting to drive value.

play00:32

Now, large language models are actually a part of a different class of models called foundation models.

play00:42

Now, the term "foundation models" was actually first coined by a team from Stanford when they saw that the field of AI was converging to a new paradigm.

play00:52

Where before AI applications were being built by training,

play00:56

maybe a library of different AI models, where each AI model was trained on very task-specific data to perform very specific task.

play01:08

They predicted that we were going to start moving to a new paradigm,

play01:14

where we would have a foundational capability, or a foundation model, that would drive all of these same use cases and applications.

play01:24

So the same exact applications that we were envisioning before with conventional AI, and the same model could drive any number of additional applications.

play01:33

The point is that this model could be transferred to any number of tasks.

play01:38

What gives this model the super power to be able to transfer to multiple different tasks and perform multiple different functions

play01:44

is that it's been trained on a huge amount, in an unsupervised manner, on unstructured data.

play01:57

And what that means, in the language domain, is basically I'll feed a bunch of sentences-- and I'm talking terabytes of data here --to train this model.

play02:07

And the start of my sentence might be "no use crying over spilled" and the end of my sentence might be "milk".

play02:15

And I'm trying to get my model to predict the last word of the sentence based off of the words that it saw before.

play02:22

And it's this generative capability of the model-- predicting and generating the next word --based off of previous words that it's seen beforehand,

play02:29

that is why that foundation models are actually a part of the field of AI called generative AI

play02:40

because we're generating something new in this case, the next word in a sentence.

play02:46

And even though these models are trained to perform, at its core, a generation past, predicting the next word in the sentence, we actually can take these models,

play02:55

and if you introduce a small amount of labeled data to the equation, you can tune them to perform traditional NLP tasks-- things like classification, or

play03:06

named-entity recognition --things that you don't normally associate as being a generative-based model or capability.

play03:13

And this process is called tuning.

play03:16

Where you can tune your foundation model by introducing a small amount of data,

play03:19

you update the parameters of your model and now perform a very specific natural language task.

play03:25

If you don't have data, or have only very few data points, you can still take these foundation models and they actually work very well in low-labeled data domains.

play03:39

And in a process called prompting or prompt engineering, you can apply these models for some of those same exact tasks.

play03:50

So an example of prompting a model to perform a classification task

play03:54

might be you could give a model a sentence and then ask it a question: Does this sentence have a positive sentiment or negative sentiment?

play04:03

The model's going to try and finish generating words in that sentence, and the next natural word in that sentence would be the answer to your classification problem,

play04:10

which would respond either positive or negative, depending on where it estimated the sentiment of the sentence would be.

play04:17

And these models work surprisingly well when applied to these new settings and domains.

play04:23

Now, this is a lot of where the advantages of foundation models come into play.

play04:29

So if we talk about the advantages, the chief advantage is the performance.

play04:40

These models have seen so much data.

play04:43

Again, data with a capital D-- terabytes of data --that by the time that they're applied to small tasks,

play04:49

they can drastically outperform a model that was only trained on just a few data points.

play04:54

The second advantage of these models are the productivity gains.

play05:04

So just like I said earlier, through prompting or tuning, you need far less label data to get to task-specific model

play05:13

than if you had to start from scratch because your model is taking advantage of all the unlabeled data that it saw in its pre-training when we created this generative task.

play05:23

With these advantages, there are also some disadvantages that are important to keep in mind.

play05:34

And the first of those is the compute cost.

play05:40

So that penalty for having this model see so much data is that they're very expensive to train,

play05:47

making it difficult for smaller enterprises to train a foundation model on their own.

play05:53

They're also expensive-- by the time they get to a huge size, a couple billion parameters --they're also very expensive to run inference.

play06:01

You might require multiple GPUs at a time just to host these models and run inference, making them a more costly method than traditional approaches.

play06:10

The second disadvantage of these models is on the trustworthiness side.

play06:14

So just like data is a huge advantage for these models, they've seen so much unstructured data, it also comes at a cost, especially in the domain like language.

play06:22

A lot of these models are trained basically off of language data that's been scraped from the Internet.

play06:28

And there's so much data that these models have been trained on.

play06:31

Even if you had a whole team of human annotators, you wouldn't be able to go through

play06:35

and actually vet every single data point to make sure that it wasn't biased and didn't contain hate speech or other toxic information.

play06:42

And that's just assuming you actually know what the data is.

play06:45

Often we don't even know-- for a lot of these open source models that have been posted

play06:48

--what the exact datasets are that these models have been trained on leading to trustworthiness issues.

play06:55

So IBM recognizes the huge potential of these technologies.

play06:59

But my partners in IBM Research are working on multiple different innovations to try

play07:03

and improve also the efficiency of these models and the trustworthiness and reliability of these models to make them more relevant in a business setting.

play07:12

All of these examples that I've talked through so far have just been on the language side.

play07:16

But the reality is, there are a lot of other domains that foundation models can be applied towards.

play07:21

Famously, we've seen foundation models for vision --looking at models such as DALL-E 2, which takes text data, and that's then used to generate a custom image.

play07:33

We've seen models for code with products like Copilot that can help complete code as it's being authored.

play07:40

And IBM's innovating across all of these domains.

play07:43

So whether it's language models that we're building into products like Watson Assistant and Watson Discovery,

play07:48

vision models that we're building into products like Maximo Visual Inspection,

play07:53

or Ansible code models that we're building with our partners at Red Hat under Project Wisdom.

play07:58

We're innovating across all of these domains and more.

play08:02

We're working on chemistry.

play08:06

So, for example, we just published and released molformer, which is a foundation model to promote molecule discovery or different targeted therapeutics.

play08:16

And we're working on models for climate change, building Earth Science Foundation models using geospatial data to improve climate research.

play08:26

I hope you found this video both informative and helpful.

play08:29

If you're interested in learning more, particularly how IBM is working to improve some of these disadvantages,

play08:35

making foundation models more trustworthy and more efficient, please take a look at the links below.

play08:39

Thank you.

Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
AIFoundation ModelsBusiness StrategyIBM ResearchGenerative AIMachine LearningEnterprise ValueAI ApplicationsData EfficiencyTrustworthiness