The moment we stopped understanding AI [AlexNet]

Welch Labs
1 Jul 202417:38

Summary

TLDRThis video explores the inner workings of AI models like Chat GPT and AlexNet, revealing how simple compute blocks, when scaled massively with data, can perform complex tasks. It delves into the concept of embedding spaces, where high-dimensional data is organized, and how models like AlexNet learn to recognize patterns without explicit instructions. The video also highlights the power of deep learning and the challenges in understanding these models, ending with a discussion on the future of AI and its potential breakthroughs.

Takeaways

  • ๐Ÿง  The script discusses the inner workings of AI models like AlexNet and Chat GPT, emphasizing the high-dimensional spaces they use to understand the world.
  • ๐Ÿ“ˆ AlexNet, introduced in 2012, was a breakthrough in AI, demonstrating the power of scaling up neural networks for computer vision tasks.
  • ๐Ÿ” AlexNet's success hinged on the use of convolutional blocks, which are a type of compute block that can detect patterns in images.
  • ๐Ÿค– Chat GPT operates on a similar principle but for language, using 'transformers' to process and generate human-like text based on input matrices.
  • ๐Ÿ“š The script highlights the importance of vast amounts of data for training AI models, which allows them to learn complex patterns and behaviors.
  • ๐Ÿ”‘ The intelligence of models like Chat GPT is not inherent but emerges from the combination of simple operations on large datasets.
  • ๐Ÿ‘€ AlexNet's first layer learns to detect edges and color blobs, which are foundational for recognizing more complex visual concepts.
  • ๐Ÿ”ฎ Deep learning models map inputs to high-dimensional spaces where similar concepts are close together, forming a kind of 'activation atlas'.
  • ๐ŸŒ The script mentions 'feature visualization', a technique that generates images designed to maximize specific neural activations, revealing what the model has learned.
  • ๐ŸŽฏ AlexNet's performance in the ImageNet competition marked a shift towards data-driven AI and away from expert-crafted algorithms.
  • ๐Ÿš€ The scale of data and compute power is a defining characteristic of modern AI, with models like Chat GPT having over a trillion parameters.

Q & A

  • What was the significance of the 2012 AlexNet paper in the field of computer vision?

    -The AlexNet paper was significant because it demonstrated the effectiveness of deep learning in computer vision. It shocked the community by showing that an old AI idea, when scaled up, could perform exceptionally well. It marked the beginning of a new era in AI, where deep neural networks became the dominant approach.

  • What is the basic function of a Transformer block in AI models like Chat GPT?

    -A Transformer block in AI models like Chat GPT performs a set of fixed matrix operations on an input matrix of data and typically returns an output matrix of the same size. These blocks are fundamental to the model's ability to process and generate responses based on the input data.

  • How does Chat GPT formulate a response to a user's query?

    -Chat GPT formulates a response by breaking down the query into words and word fragments, mapping each to a vector, and stacking these vectors into a matrix. This matrix is then processed through multiple Transformer blocks. The model predicts the next word or word fragment based on the final output matrix, which is appended to the original output and fed back into the model until a stop word fragment is reached.

  • What is the role of the final output matrix's last column in Chat GPT's response generation?

    -The last column of Chat GPT's final output matrix is mapped from a vector back to text to generate the next word or word fragment in the response. This process is repeated with each new word fragment being added to the input matrix until a stop word fragment is returned.

  • How does the training of AlexNet differ from that of Chat GPT in terms of the task they are designed to perform?

    -AlexNet is trained to predict a label given an image, whereas Chat GPT is trained to predict the next word fragment given some text. Both models learn from large datasets, but the nature of the task and the type of data they process are different.

  • What is the purpose of the convolutional blocks in the first layers of AlexNet?

    -The convolutional blocks in the first layers of AlexNet are used to detect basic visual patterns like edges and color blobs in the input image. These blocks transform the image by sliding smaller tensors, or kernels, across the image and computing the dot product at each location, which serves as a similarity score.

  • How does the visualization of AlexNet's first layer kernels help us understand what the model has learned?

    -The visualization of AlexNet's first layer kernels as RGB images provides insight into the basic visual patterns the model has learned to detect, such as edges and color blobs. This helps us understand how the model begins to interpret the input image at a fundamental level.

  • What is an 'activation atlas' and how does it help visualize the embedding spaces of deep neural networks?

    -An activation atlas is a visualization technique that shows how deep neural networks organize the visual world or concepts in high-dimensional embedding spaces. It provides a way to see smooth visual transitions between related concepts and understand how the model represents different ideas in its internal space.

  • How do the synthetic images generated by feature visualization help in understanding a model's learned representations?

    -Synthetic images generated by feature visualization are optimized to maximize a given activation. These images provide a visual representation of what a specific activation layer is looking for, offering another way to see the learned representations within the model.

  • What was the key difference in 2012 that allowed AlexNet to achieve unprecedented success in the ImageNet competition?

    -The key difference in 2012 was the scale of data and compute power available. The ImageNet dataset provided a large labeled dataset, and the use of Nvidia GPUs provided significant computational power, allowing AlexNet to learn from vast amounts of data with its deep neural network architecture.

  • How does the scale of parameters in AI models like AlexNet and Chat GPT contribute to their performance and complexity?

    -The scale of parameters in AI models directly contributes to their performance by allowing them to learn more complex patterns and representations. However, it also increases the complexity and difficulty in understanding how these models work, as seen in the exponential growth from AlexNet's 60 million parameters to Chat GPT's over a trillion parameters.

Outlines

00:00

๐Ÿง  The Emergence of AI Intelligence Through Scale

This paragraph introduces the concept of AI models like AlexNet and chat GPT, which utilize high-dimensional spaces for data representation. AlexNet, introduced in 2012, demonstrated the power of scaling up AI ideas with a simple 8-page paper. It laid the groundwork for models like chat GPT, which uses 'transformers' to process input data through a series of matrix operations. The paragraph emphasizes the non-intuitive nature of these models, which lack obvious signs of intelligence but perform complex tasks through repeated matrix manipulations. It also touches on the training process of these models on vast datasets, which allows them to develop the ability to perform tasks like writing essays and solving math problems.

05:01

๐Ÿ” Deep Dive into Neural Network Layers and Feature Visualization

This paragraph delves into the inner workings of neural networks, specifically AlexNet, and how they learn to recognize patterns. It explains the role of convolutional blocks and how they transform input images into activation maps that highlight areas of the image that match learned kernels. The paragraph discusses the progression from simple feature detection, such as edges and color blobs, to complex concepts like faces, which are learned without explicit instruction. It also introduces feature visualization techniques that generate synthetic images to maximize specific activations, providing insight into what the network has learned.

10:02

๐ŸŽจ Activation Atlases: Visualizing Neural Network Embedding Spaces

The paragraph discusses the creation of activation atlases, which are visualizations that represent the high-dimensional embedding spaces of neural networks. These atlases show how models like AlexNet organize visual concepts, with similar concepts being close to each other in the embedding space. The paragraph describes how synthetic images are used to create a two-dimensional projection of these spaces, allowing for a visual walk through the model's understanding of concepts. It also touches on the semantic meaningfulness of the embedding space directions and how they can be manipulated to shift attributes like age or gender in images.

15:04

๐Ÿš€ The Evolution and Scale of AI: From AlexNet to chat GPT

This paragraph reflects on the historical context and evolution of AI, highlighting the significance of AlexNet's victory in the ImageNet competition and the shift in AI approaches that followed. It discusses the scalability of neural networks and the computational advancements that have allowed models like chat GPT to grow to over a trillion parameters. The paragraph also contemplates the future of AI, considering the possibility of new breakthroughs emerging from scaling up existing models or the resurgence of older AI methods.

๐Ÿค– The Complexity and Unpredictability of AI Development

The final paragraph addresses the complexity and unpredictability inherent in AI development. It acknowledges the difficulty in understanding the inner workings of models with vast numbers of parameters and the challenge of visualizing high-dimensional spaces. The paragraph also reflects on the historical underestimation of the potential of neural networks and the surprising resurgence of older AI techniques, such as those used in AlexNet. It concludes with a nod to the ongoing exploration and discovery in the field of AI, emphasizing the importance of continued research and development.

Mindmap

Keywords

๐Ÿ’กActivation Atlas

An Activation Atlas is a visualization tool used to represent high-dimensional embedding spaces in which AI models organize data. It helps to understand how models like AlexNet and GPT perceive and process information. In the video, the Activation Atlas is mentioned as a way to visualize the complex, high-dimensional spaces that modern AI models use, providing insight into the organization of visual and linguistic data.

๐Ÿ’กAlexNet

AlexNet is a convolutional neural network that was influential in the field of computer vision. It was introduced in 2012 and demonstrated the effectiveness of deep learning for image recognition tasks. The video discusses AlexNet as the first model to show high performance in computer vision using deep learning, marking a significant shift towards scalability and away from explainability in AI.

๐Ÿ’กTransformers

Transformers are a type of neural network architecture that is foundational in models like GPT. They process input data through a series of compute blocks that perform fixed matrix operations. The video explains that in models like Chat GPT, these transformers are stacked in layers, with each layer passing an output matrix to the next, contributing to the model's ability to generate text.

๐Ÿ’กConvolutional Blocks

Convolutional blocks are a component of neural networks used for image processing. They involve the use of kernels that slide over the input image to compute dot products, which serve as similarity scores. In the video, convolutional blocks in AlexNet are highlighted as the first step in transforming an image into a series of activation maps that the network uses to detect features like edges and color blobs.

๐Ÿ’กEmbedding Space

Embedding space is a high-dimensional space where data points, such as images or words, are represented as vectors. The video script discusses how similar concepts in image or language data are mapped close to each other in this space, and how directionality can be semantically meaningful, allowing for operations like age or gender shifting in faces.

๐Ÿ’กFeature Visualization

Feature visualization is a technique used to understand what features a neural network has learned to recognize. It involves generating synthetic images that maximize the activation of a particular set of neurons. The video describes how feature visualization provides insight into what specific layers of AlexNet are looking for, such as faces, by creating images that strongly activate those features.

๐Ÿ’กNearest Neighbors

In the context of the video, nearest neighbors refer to images in the dataset that are closest in the high-dimensional embedding space to a given test image. The video explains how Hinton's team found that the nearest neighbors in this space showed highly similar concepts to the test images, indicating that AlexNet learned to represent similar concepts close together.

๐Ÿ’กLanguage Models

Language models are AI models designed to understand and generate human-like text. The video discusses large language models, such as GPT, which use transformers to map words and word fragments to vectors in an embedding space, where semantically similar words are located close to each other, allowing the model to generate coherent text.

๐Ÿ’กBackpropagation

Backpropagation is a multivariate calculus technique used to trainๅคšๅฑ‚ neural networks by adjusting weights to minimize the difference between the predicted and actual outputs. The video mentions that backpropagation was key to training deep models like AlexNet, enabling the learning of complex patterns from data.

๐Ÿ’กNeural Networks

Neural networks are a set of algorithms modeled loosely after the human brain that are designed to recognize patterns. They consist of interconnected nodes or 'neurons' that process information. The video script traces the history of neural networks from their origins in the 1940s to the development of AlexNet, which demonstrated the power of scaling up these networks.

๐Ÿ’กHigh-Dimensional Representation

High-dimensional representation refers to the way complex data, such as images, are encoded into vectors in a space with many dimensions. The video explains how AlexNet maps images to points in a 4,096-dimensional space, where similar concepts are physically close, allowing for tasks like finding nearest neighbors in the dataset.

Highlights

Introduction of the Activation Atlas, a tool to visualize high-dimensional spaces used by AI models.

AlexNet's groundbreaking impact on computer vision in 2012 by demonstrating the effectiveness of scaled AI models.

The role of Ilya Sutskever in co-founding OpenAI and the development of models like Chat GPT.

The inner workings of Chat GPT, emphasizing the use of transformer blocks and matrix operations.

How Chat GPT formulates responses through a series of matrix transformations and vector mappings.

The surprising simplicity of GPT's response generation from the final output matrix's last column.

The importance of data in training models like AlexNet and Chat GPT for high performance.

AlexNet's training to predict image labels and its comparison to Chat GPT's text prediction.

Visualization of the first convolutional layer in AlexNet and its learned patterns.

The transformation of images through convolutional blocks and the creation of activation maps.

How AlexNet's deeper layers respond to higher-level concepts without explicit instructions.

Feature visualization technique to understand what specific activation layers are detecting.

The concept of high-dimensional embedding spaces in AI models like AlexNet.

The nearest neighbor experiment showing similar concepts in high-dimensional space.

The significance of the perceptron and backpropagation in training deep neural networks.

The scale of data and compute power as the key to the success of AI models like AlexNet and Chat GPT.

The comparison between the computational cost of older AI approaches and the efficiency of modern models.

The unpredictability of AI breakthroughs and the potential for future advancements.

The role of activation atlases in visualizing and understanding the organization of concepts in AI models.

Sponsorship mention of KiwiCo and its focus on educational products for children.

Transcripts

play00:00

this is an activation Atlas it gives us

play00:02

a glimpse into the high-dimensional

play00:04

embedding spaces modern AI models use to

play00:07

organize and make sense of the world the

play00:09

first model to really see the world like

play00:11

this alexnet was published in 2012 in an

play00:14

8-page paper that shocked the computer

play00:16

vision Community by showing that an old

play00:18

AI idea would work unbelievably well

play00:21

when scaled the paper second author ilas

play00:24

HK would go on co-found open AI where he

play00:27

and the open AI team would massively

play00:29

scale up this idea again to create chat

play00:31

GPT this video is sponsored by kiwico

play00:34

more on them later if you look under the

play00:37

hood of chat GPT you won't find any

play00:39

obvious signs of intelligence instead

play00:42

you'll find layer after layer of compute

play00:44

blocks called transformers this is what

play00:46

the T and GPT stands for each

play00:48

Transformer performs a set of fixed

play00:50

Matrix operations on an input Matrix of

play00:52

data and typically returns an output

play00:54

Matrix of the same size to figure out

play00:56

what it's going to say next chat GPT

play00:58

breaks apart what you ask get into words

play01:00

and word fragments Maps each of these to

play01:03

a vector and stacks all of these vectors

play01:05

together into a matrix this Matrix is

play01:08

then passed into the first Transformer

play01:10

block which returns a new Matrix of the

play01:12

same size this operation is then

play01:14

repeated again and again 96 times in

play01:17

chat GPT 3.5 and reportedly 120 times in

play01:20

chat GPT 4 now here's the Absurd part

play01:24

with a few caveats the next word or word

play01:27

fragment that chat GPT says back to you

play01:29

is is literally just the last column of

play01:31

its final output Matrix mapped from a

play01:34

vector back to text to formulate a full

play01:37

response this new word or word fragment

play01:39

is appended to the end of the original

play01:41

output and this new slightly longer text

play01:43

is fed back into the input of chat GPT

play01:46

this process is repeated again and again

play01:49

with one new column added to the input

play01:51

Matrix each time until the model's

play01:53

output returns a special stop word

play01:55

fragment and that is it one Matrix

play01:58

multiply after another GPT slowly morphs

play02:01

the input you give it into the output it

play02:03

returns where is the

play02:06

intelligence how is it that these 100 or

play02:08

so blocks of dumb compute are able to

play02:10

write essays translate language

play02:12

summarized books solve math problems

play02:14

explain complex Concepts or even at the

play02:16

next line of this script the answer lies

play02:19

in the vast amounts of data these models

play02:21

are trained on okay pretty good but not

play02:24

quite what I wanted to say next the

play02:26

alexnet paper is significant because it

play02:28

marks the first time we really see

play02:29

layers of compute blocks like this

play02:31

learning to do unbelievable things an AI

play02:34

Tipping Point towards high performance

play02:36

in scale and away from explainability

play02:39

while chat GPT is trained to predict the

play02:41

next word fragment given some text Alex

play02:43

net is trained to predict a label given

play02:45

an image the input image to alexnet is

play02:48

represented as a three-dimensional

play02:49

Matrix or tensor of RGB intensity values

play02:53

and the output is a single Vector of

play02:54

length 1,000 where each entry

play02:57

corresponds to Alex Net's predicted

play02:58

probability that the input put image

play03:00

belongs to one of the a thousand classes

play03:02

in the imag net data set things like

play03:04

tabby cats German Shepherds hot dogs

play03:06

toasters and aircraft

play03:08

carriers just like chat GPT today

play03:11

alexnet was somehow magically able to

play03:13

map the inputs we give it into the

play03:15

outputs we wanted using layer after

play03:17

layer of compute block after training on

play03:19

a large data set one nice thing about

play03:22

Vision models however is that it's

play03:23

easier to poke around under the hood and

play03:26

get some idea of what the model has

play03:27

learned one of the first under the hood

play03:30

insights that kvky suit and Hinton show

play03:32

in the Alex net paper is that the model

play03:34

has learned some very interesting visual

play03:36

patterns in its first layer the first

play03:39

five layers of alexnet are all

play03:40

convolutional blocks first developed in

play03:43

the late 1980s to classify handwritten

play03:45

digits and can be understood as a

play03:47

special case of the Transformer blocks

play03:49

in chat GPT and other large language

play03:51

models in convolutional blocks the input

play03:54

image tensor is transformed by sliding a

play03:56

much smaller tensor called a kernel of

play03:58

learned weight values across the image

play04:00

and at each location Computing the dot

play04:02

product between the image and kernel

play04:05

here it's helpful to think of the dot

play04:06

product as a similarity score the more

play04:09

similar a given patch of the image and

play04:10

kernel are the higher the resulting dot

play04:12

product will be Alex net uses 96

play04:15

individual kernels in its first layer

play04:18

each of Dimension 11 by 11 by3 so

play04:20

conveniently we can visualize them as

play04:22

little RGB images these images give us a

play04:25

nice idea of how the first layer of

play04:27

alexnet sees the image the upper kernels

play04:30

in this figure show where Alex and has

play04:31

clearly learned to detect edges or rapid

play04:34

changes from light to dark at various

play04:36

angles images with similar patterns will

play04:38

generate High Dot products with these

play04:40

kernels below we see where Alexon has

play04:42

learned to detect Blobs of various

play04:44

colors these kernels are all initialized

play04:46

as random numbers and the patterns we're

play04:49

looking at are completely learned from

play04:50

data sliding each of our 96 kernels over

play04:53

the input image and Computing the dot

play04:55

product at each location produces a new

play04:57

set of 96 matrices sometimes called

play05:00

activation Maps conveniently we can view

play05:03

these as images as well the activation

play05:06

Maps show us which parts of an image if

play05:08

any match a given kernel well if I hold

play05:11

up something visually similar to a given

play05:13

kernel we see high activation in that

play05:16

part of the activation

play05:17

map notice that it goes away when I

play05:20

rotate the pattern by 90ยฐ the image and

play05:23

kernel are no longer aligned you can

play05:25

also see various activation Maps picking

play05:27

up edges and other lowl features in our

play05:30

image of course finding edges and color

play05:32

blobs in images is still hugely removed

play05:35

from recognizing complex Concepts like

play05:36

German Shepherds or aircraft carriers

play05:39

what's astounding about deep neural

play05:41

networks like alexnet and chat GPT is

play05:43

that from here all we do is repeat the

play05:45

same operation again just with a

play05:47

different set of learned weights for

play05:50

Alex net this means that these 96

play05:52

activation maps are stacked together

play05:53

into a tensor that become the input to

play05:56

the exact same type of convolutional

play05:58

compute block the second overall layer

play06:00

in the model we can make our activations

play06:02

easier to see by removing the values

play06:04

close to zero unfortunately in our

play06:06

second layer we can't learn much by

play06:08

simply visualizing the weight values and

play06:10

the kernels themselves the first issue

play06:13

is that we just can't see enough colors

play06:15

the depth of the kernel has to match the

play06:17

depth of the incoming data in the first

play06:19

layer of alexnet the depth of the

play06:21

incoming data is just three because the

play06:23

model takes in color images with red

play06:25

green and blue color channels however

play06:28

since the first layer computes 9 6

play06:29

separate activation Maps the computation

play06:32

in the second layer of alexnet is like

play06:34

processing images with 96 separate color

play06:37

channels the second factor that makes

play06:39

what's happening in the second layer of

play06:40

alexnet more difficult to visualize is

play06:43

that the dot products are really taking

play06:44

weighted combinations of the

play06:46

computations in the first layer we need

play06:48

some way to visualize how the layers are

play06:50

working together a simple way to see

play06:52

what's going on is to try to find parts

play06:54

of various images that strongly activate

play06:56

the outputs of the second layer for

play06:59

example this activation map appears to

play07:01

be putting together Edge detectors to

play07:03

form basic Corners remarkably as we move

play07:06

deeper into alexnet strong activations

play07:08

correspond to higher and higher level

play07:11

concepts by the time we reach the fifth

play07:13

layer we have activation maps that

play07:15

respond very strongly to faces and other

play07:17

highlevel Concepts and what's incredible

play07:20

here is that no one explicitly told Alex

play07:22

net what a face is all alexnet had to

play07:25

learn from were the images and labels in

play07:27

the imag net data set which does not not

play07:29

contain a person or a face class Alex

play07:32

net was able to learn completely on its

play07:34

own both that faces are important and

play07:36

how to recognize them to better

play07:39

understand what a given Colonel and Alex

play07:40

net has learned we can also look at the

play07:42

examples in the training data set that

play07:44

give the highest activation values for

play07:46

that kernel for our face kernel not

play07:48

surprisingly we find examples that

play07:50

contain people finally there's this

play07:52

really interesting technique called

play07:53

feature visualization where we can

play07:55

generate synthetic images that are

play07:57

optimized to maximize a given activation

play08:00

these synthetic images give us another

play08:02

way to see what a specific activation

play08:03

layer is looking

play08:05

for by the time we reach the final layer

play08:07

of alexnet our image has been processed

play08:09

into a vector of length

play08:12

4,096 the final layer performs one last

play08:14

Matrix computation on this Vector to

play08:16

create a final output Vector of length

play08:18

1,000 with one entry for each of the

play08:21

classes in the imag net data set chfi

play08:23

suit and Hinton noticed that the second

play08:25

to last layer Vector demonstrated some

play08:28

very interesting properties

play08:30

one way to think about this Vector is as

play08:32

a point in 4,096 dimensional space each

play08:35

image we pass into the model is

play08:37

effectively mapped to a point in this

play08:39

space all we have to do is just stop one

play08:41

layer early and grab this Vector just as

play08:44

we can measure the distance between two

play08:46

points in 2D space we can also measure

play08:48

the distance between points or images in

play08:50

this high-dimensional space hinton's

play08:52

team ran a simple experiment where they

play08:54

took a test image in the imag net data

play08:56

set computed its corresponding vector

play08:59

and then search for the other images in

play09:01

the imag net data set that were closest

play09:03

or the nearest neighbors to the test

play09:04

image in this High dimensional space

play09:07

remarkably the nearest neighbor images

play09:09

showed highly similar Concepts to the

play09:11

test images in figure four from the Alex

play09:13

net paper we see an example where an

play09:15

elephant test image yields nearest

play09:17

neighbors that are all

play09:19

elephants what's interesting here too is

play09:21

that the pixel values themselves between

play09:23

these images are very different Alex net

play09:25

really has learned high-dimensional

play09:27

representations of data where similar

play09:29

concepts are physically close this

play09:32

high-dimensional space is often called a

play09:33

latent or embedding space in the Years

play09:36

following the alexnet paper it was shown

play09:38

that not only distance but

play09:40

directionality in some of these

play09:41

embedding spaces is Meaningful the demos

play09:44

you see where faces are age or gender

play09:46

shifted often work by first mapping an

play09:48

image to a vector in an embedding space

play09:51

and then literally moving this point in

play09:53

the age or gender Direction in that

play09:55

embedding space and then mapping the

play09:57

modified Vector back to an image

play10:00

before we get into activation atlases

play10:02

which give us an amazing way to

play10:03

visualize these embedding spaces please

play10:06

take a moment to consider if this video

play10:07

sponsor is something that you or someone

play10:09

in your life would enjoy I was genuinely

play10:12

really excited to work with this company

play10:14

they make incredibly thoughtful

play10:15

educational products and by using the

play10:17

link in the description below you're

play10:19

really helping me make more of these

play10:21

videos this video sponsor is kiwo they

play10:24

make these fun and super well-designed

play10:26

educational crates for kids of all ages

play10:29

they have nine different monthly

play10:30

subscription lines to choose from focus

play10:32

on different areas of steam and you can

play10:34

also buy individual crates which are

play10:36

great for trying out kiwo and make

play10:38

amazing gifts growing up I was

play10:41

constantly building here I am building a

play10:43

tower outside my house to my second

play10:45

story bedroom I was obsessed with

play10:48

electronics and would have absolutely

play10:49

loved projects like this pencil

play10:51

sharpener from the Eureka crate line

play10:53

which is focused on science and

play10:55

engineering I really believe that this

play10:57

type of Hands-On self-driven learning is

play10:59

magical when I really think about my own

play11:01

education it's the times that I've been

play11:03

fully absorbed in projects like this

play11:05

that I learned the most and now that I'm

play11:07

a dad I really want my kids to have the

play11:09

same kind of experiences kiwo really

play11:12

does an amazing job boxing up start to

play11:14

finish projects like this my daughter

play11:17

just got the panda crate for fine motor

play11:19

skills it includes these special crayons

play11:21

specifically designed to help her learn

play11:23

different ways of grasping you can see

play11:25

her here insisting that she gets to

play11:26

bring them in the car with us huge

play11:28

thanks to kiwo for sponsoring this video

play11:31

use the discount code Welch labs for 50%

play11:33

off your first month of a subscription

play11:36

now back to alexnet there's some really

play11:38

amazing work that combines the synthetic

play11:40

images that maximize a given set of

play11:42

activations with a two-dimensional

play11:44

projection or flattening out of the

play11:46

embedding space to make these incredible

play11:48

visualizations called activation atlases

play11:51

Neighbors on the activation Atlas are

play11:53

generally close in the embedding space

play11:55

and show similar Concepts the model has

play11:58

learned we're getting a peak into how

play12:00

deep neural networks organize the visual

play12:02

world looking at the synthetic images

play12:04

that most activate neighborhoods of

play12:06

neurons we can visually walk through the

play12:08

embedding space of the model seeing it

play12:11

Mak smooth visual transitions from

play12:12

Concepts like zebras to Tigers to

play12:15

leopards to rabbits moving to the middle

play12:17

layers of the model we can see less

play12:19

fully formed but still meaningful

play12:21

Concepts moving along this path

play12:23

amazingly correlates with the number and

play12:25

size of pieces of fruit in an image the

play12:28

same princip applies in large language

play12:30

models words and word fragments are

play12:33

mapped to vectors in an embedding space

play12:35

where words with similar meanings are

play12:37

close to each other and the directions

play12:39

in the embedding space are sometimes

play12:41

semantically meaningful there's some

play12:43

incredible very recent work from the

play12:44

team at anthropic that shows how sets of

play12:47

activations can be mapped to Concepts in

play12:49

language these results can help us

play12:51

better understand how llms work and can

play12:53

be used to modify Model Behavior after

play12:56

clamping a set of activations that

play12:58

correspond to the concept Golden Gate

play13:00

Bridge to a high value the llm the team

play13:03

was experimenting with began to identify

play13:05

itself as the Golden Gate Bridge Alex

play13:08

net won the imag net large scale visual

play13:10

recognition challenge by a wide margin

play13:12

in 2012 the third year the challenge was

play13:15

run in Prior years the winning teams

play13:18

used approaches that under the hood look

play13:20

much more like what you might expect to

play13:22

find in an intelligent system the 2011

play13:25

winner used a complex set of very

play13:27

different algorithms starting starting

play13:29

with an algorithm called cift which is

play13:31

composed of specialized image analysis

play13:33

techniques developed by experts over

play13:35

many years of research Alex net in

play13:37

contrast is an implementation of a much

play13:40

older AI idea an artificial neural

play13:43

network where the behavior of the

play13:44

algorithm is almost entirely learned

play13:46

from data the dot product operation

play13:49

between the data and a set of Weights

play13:51

was originally proposed by molic and

play13:53

pits in the 1940s as a dramatically

play13:55

oversimplified model of the neurons in

play13:57

our brain in the second half of each

play14:00

Transformer Block in chat GPT and at the

play14:02

end of alexnet you'll find a multi-layer

play14:05

perceptron the perceptron is a learning

play14:08

algorithm and physical machine from the

play14:10

1950s that uses molic and pits neurons

play14:13

and can learn to perform basic shaped

play14:15

recognition tasks back in the 1980s a

play14:18

younger Jeff Hinton and his

play14:19

collaborators at Carnegie melon showed

play14:21

how to train multiple layers of these

play14:23

perceptrons using a multivariate

play14:25

calculus technique called back

play14:27

propagation these models a couple layers

play14:30

deep and remarkably pretty good at

play14:31

driving cars in the 1990s Yan laon now

play14:35

Chief AI scientist at meta was able to

play14:38

train five layer deep models to

play14:40

recognize handwritten digits despite the

play14:43

intermittent successes of artificial

play14:44

neural networks over the years this

play14:46

approach was hardly the accepted way to

play14:48

do AI right up until the publication of

play14:51

alexnet if this was obviously the way to

play14:54

build intelligence systems we would have

play14:56

done it decades earlier as Ian

play14:58

Goodfellow writes in his excellent deep

play15:00

learning book at this point deep

play15:02

networks were generally believed to be

play15:03

very difficult to train we now know that

play15:06

algorithms that have existed since the

play15:07

1980s work quite well but this was not

play15:10

apparent CC 2006 the issue is perhaps

play15:13

simply that these algorithms were too

play15:15

computationally costly to allow much

play15:17

experimentation with the hardware

play15:18

available at the time the key difference

play15:21

in 2012 was simply scale of data and

play15:24

scale of compute the imag net data set

play15:27

was the largest labeled data set of its

play15:28

kind kind to date with over 1.3 million

play15:31

images and thanks to Nvidia gpus in 2012

play15:35

hinton's team had access to roughly

play15:37

10,000 times more compute power than Yan

play15:39

laon had 15 years before laon's layet 5

play15:43

model had around 60,000 learnable

play15:46

parameters Alex net increased this a

play15:48

thousandfold to around 60 million

play15:50

parameters today chat GPT has well over

play15:53

a trillion parameters making it over

play15:55

10,000 times larger than alexnet this

play15:59

mindboggling scale is the Hallmark of

play16:02

this third wave of AI we find ourselves

play16:04

in today driving both their performance

play16:06

and the fundamental difficulty in

play16:08

understanding how these models are able

play16:09

to do what they do it's amazing that we

play16:12

can figure out that Alex net learns

play16:14

representations of faces and that large

play16:16

language models learn representations of

play16:18

Concepts like the Golden Gate Bridge but

play16:20

there are many many more Concepts these

play16:23

models learn that we don't even have

play16:24

words for Activation atlases are

play16:27

beautiful and fascinating but very

play16:29

low-dimensional projections of very high

play16:31

dimensional spaces where our spatial

play16:33

reasoning abilities often fall apart

play16:36

it's notoriously difficult to predict

play16:38

where AI will go next almost no one

play16:42

expected the neural networks of the 80s

play16:43

and 90s scaled up by three or four

play16:45

orders of magnitude to yield alexnet and

play16:48

it was almost impossible to predict that

play16:50

a generalization of the compute blocks

play16:52

in alexnet scaled up by forers of

play16:54

magnitude would yield chat GPT maybe the

play16:57

next AI breakthrough is just another

play16:59

three to four orders of magnitude of

play17:00

scale away or maybe some mostly

play17:03

forgotten approach to AI will resurface

play17:06

as Alex net did in 2012 we'll have to

play17:09

wait and

play17:10

see are you mad that I called the blocks

play17:12

of compute

play17:14

[Music]

play17:17

dumb not at

play17:19

all describing the compute blocks as

play17:21

dumb highlights the impressive nature of

play17:24

how simple operations can combine to

play17:25

produce intelligent

play17:27

Behavior it's a great way to emphasize

play17:29

the power of the underlying algorithms

play17:31

and training data

Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
Artificial IntelligenceDeep LearningNeural NetworksAlexNetGPTComputer VisionLanguage ModelsData ScienceMachine LearningAI EvolutionEmbedding Spaces