AI Hype is completely out of control - especially since ChatGPT-4o

Internet of Bugs
10 Jun 202422:09

Summary

TLDRThe video script discusses skepticism around large-language model AIs' impact on software development. It critiques the hype following the release of ChatGPT-4o, questioning job market disruption and AI's actual capabilities. The speaker, Carl, uses evidence-based reasoning to argue that despite speed improvements, there's mixed evidence on AI's accuracy. He highlights the history of companies overstating AI capabilities and the psychological tendency of humans to anthropomorphize AI, suggesting a 'toxic culture of lying' around AI demonstrations.

Takeaways

  • 🤖 The video discusses the impact of large-language model AIs on the software development industry and the mixed reactions to the release of ChatGPT-4o.
  • 📈 There is a debate on whether AIs will replace programmers significantly or if the current AI advancements are a passing trend.
  • 🔍 Companies like BP are reportedly using fewer programmers due to AI, but the overall industry impact is still unclear.
  • 👨‍💼 The speaker, Carl, emphasizes the importance of evidence-based analysis in understanding AI's role in the job market and its potential future.
  • 🔬 Carl's background in physics has shaped his approach to evaluating technology trends through experimentation and evidence gathering.
  • 📊 The script mentions various benchmarks that show mixed improvements in AI capabilities, suggesting that AI's ability to perform tasks correctly is not consistently better.
  • 🗣️ Voice interfaces are highlighted as a feature of ChatGPT-4o, but Carl argues that this is not a new advancement and does not significantly impact AI's capabilities.
  • 🤔 The video raises questions about the trustworthiness of AI demonstrations and the history of companies overstating AI capabilities.
  • 🕊️ The 'Eliza Effect' and 'dark patterns' in AI chatbots are discussed as psychological tricks that make humans more likely to believe in AI sentience.
  • 📉 Carl points out a trend of companies being caught lying about AI capabilities, which undermines confidence in current and future AI advancements.
  • 🧐 The video concludes by urging viewers to critically assess the evidence and be wary of narratives promoted by those with a history of dishonesty.

Q & A

  • What is the main topic of discussion in the video script?

    -The main topic of the video script is the impact of large-language model AIs, particularly ChatGPT-4o, on the software development industry and the validity of the hype surrounding AI capabilities.

  • What is the current trend in the job market regarding AI and programmers?

    -The current trend indicates that AI is causing some job disruptions, with companies like BP reporting a significant reduction in the number of programmers needed, possibly due to AI advancements.

  • What does the speaker suggest about the hype around AI and its potential impact on society?

    -The speaker suggests that the hype around AI might be exaggerated and that the truth likely lies somewhere in between the extreme views of AI replacing human jobs entirely or being as short-lived as NFTs.

  • What evidence does the speaker consider reliable for evaluating AI capabilities?

    -The speaker considers peer-reviewed papers, benchmarks, firsthand observations from unbiased sources, and trends under similar circumstances as reliable evidence for evaluating AI capabilities.

  • What is the 'Eliza Effect' mentioned in the script?

    -The 'Eliza Effect' refers to the phenomenon where humans are predisposed to believe that AI chatbots have thoughts and feelings, leading to 'powerful delusional thinking' akin to a 'slow-acting poison'.

  • What is the speaker's opinion on the voice interface feature of ChatGPT-4o?

    -The speaker is not impressed by the voice interface feature of ChatGPT-4o, stating that it is not new and has been available for some time, and that it does not necessarily represent an advancement in AI.

  • What is the term used to describe user interfaces that trick people into certain behaviors?

    -The term used to describe such user interfaces is 'dark patterns'.

  • What are some examples of companies that have been caught exaggerating or lying about their AI capabilities?

    -Examples include Tesla with its self-driving demo, Google with Duplex and Gemini AI demos, and OpenAI with the GPT-4 bar exam performance claims.

  • What is the speaker's stance on the future of AI and its potential to achieve human-level intelligence?

    -The speaker is skeptical about the near-future prospects of achieving human-level AI, citing a lack of clear evidence and a history of companies exaggerating AI capabilities.

  • What advice does the speaker give to those trying to understand the impact of AI on their careers or industries?

    -The speaker advises individuals to make up their own minds, follow the evidence, and be cautious of narratives promoted by those with a history of dishonesty.

Outlines

00:00

🤖 AI Hype and Its Impact on the Job Market

The script discusses the mixed reactions to the impact of large-language model AIs like ChatGPT-4o on the software development industry. It highlights the uncertainty in the job market, with companies like BP reportedly needing fewer programmers due to AI, while the hype suggests a future with human-like AI at a luxury car's price. The author emphasizes the need to look beyond the current hype and consider the fundamentals to predict the future of AI's role in society and its potential for disruption. The paragraph also touches on the importance of evidence-based analysis in understanding technological trends.

05:01

🔍 Evidence-Based Analysis of AI's Progress

The speaker advocates for an evidence-based approach to evaluating AI's capabilities and progress, critiquing the hype and speculation that often cloud discussions about AI. They define different levels of evidence, from peer-reviewed papers to firsthand observations, and stress the importance of benchmarks in assessing AI performance. The paragraph also addresses the mixed results from benchmarks on ChatGPT-4o, indicating that while it may be faster, its accuracy varies across different tasks, and there is no clear consensus on its overall improvement from previous models.

10:02

🧐 The Human倾向 to Anthropomorphize AI

This section delves into the psychological and ethical aspects of AI, discussing how humans are predisposed to attribute sentience to AI based on their interactions. It mentions the 'Eliza Effect' and the concept of 'dark patterns' in AI chatbots, which are designed to manipulate users into perceiving intelligence. The paragraph also points out that companies have been known to exaggerate or even fabricate AI capabilities in demos, leading to a culture of dishonesty that affects public perception and trust in AI technology.

15:04

📉 The Reality of AI Demos and Company诚信

The script provides numerous examples of companies that have been caught lying about the capabilities of their AI systems in demos. It discusses the SEC's crackdown on 'AI washing' and instances where companies have presented human-operated services as AI-driven. The paragraph calls into question the transparency and honesty of companies and journalists in presenting AI advancements, suggesting a pervasive issue with exaggerated claims and dishonesty in the industry.

20:06

🚧 The Future of AI: Speculation vs. Evidence

In the concluding paragraph, the author reflects on the uncertainty surrounding the future of AI and the importance of relying on evidence rather than speculation or the narratives pushed by those with a vested interest in promoting AI. They express skepticism about the imminent arrival of human-level AI and criticize the dishonesty that has been prevalent in the industry. The speaker encourages individuals to make informed decisions based on evidence and to be wary of the potential for manipulation and misinformation in the discourse surrounding AI.

Mindmap

Keywords

💡Large-language model AIs

Large-language model AIs, or LLMs, refer to artificial intelligence systems designed to process and generate human-like text based on vast amounts of data. They are central to the video's theme, as the speaker discusses their potential impact on the software development industry. The script mentions 'ChatGPT4o' as an example of such a model, indicating ongoing hype and debate about their capabilities and future implications.

💡Software development industry

The software development industry encompasses the creation, maintenance, and publishing of software applications. The video discusses the potential disruption caused by AI in this industry, including the reduction in the need for human programmers due to AI's increasing capabilities, as illustrated by the example of BP reportedly needing 70% fewer coders.

💡Hype

Hype in the context of the video refers to the inflated expectations and excitement surrounding new technologies, particularly AI advancements. The speaker contrasts the hype with the reality, suggesting that the true impact of AI on jobs and society is often misrepresented and that the audience should look beyond the hype to understand the fundamentals.

💡Disruption

Disruption, in the video, describes the significant changes and potential displacement in the job market due to AI advancements. The term is used to explore how AI might alter traditional job roles, particularly in software development, and the uncertainty surrounding these changes.

💡AI washing

AI washing is a term used in the script to describe the practice where companies claim to use AI in their products or services when, in reality, there is minimal or no AI involvement. The speaker criticizes this trend, citing examples of companies being caught in 'AI washing' and misleading the public about the capabilities of their AI systems.

💡Benchmarks

Benchmarks are standardized tests used to evaluate the performance of AI systems. In the video, the speaker uses benchmarks as evidence to discuss the capabilities of AI models, noting that while some benchmarks show improvements, others indicate mixed or even negative results, thus providing a nuanced view of AI progress.

💡Dark patterns

Dark patterns are manipulative design techniques used in interfaces to influence user behavior in ways that may not be in the user's best interest. The video mentions 'dark patterns' in the context of AI chatbots, suggesting that certain features, like synthetic voices and cute affects, are designed to trick users into believing the AI has sentience.

💡Ethics

Ethics in the video relates to the moral principles that should guide the development and use of AI technologies. The speaker points out the ethical implications of 'dark patterns' and the deceptive practices by companies, highlighting the need for ethical considerations in AI development.

💡Accuracy

Accuracy, in the context of the video, refers to the correctness and reliability of AI systems in performing tasks. The speaker emphasizes the importance of accuracy in determining the true capabilities of AI, noting that mixed evidence from benchmarks does not clearly indicate an improvement in AI's ability to perform tasks correctly.

💡Human-level artificial intelligence

Human-level artificial intelligence, or AGI, refers to AI systems that possess the ability to perform any intellectual task that a human being can. The video challenges the hype around achieving AGI soon, stating that there is no clear evidence supporting this claim and that many claims about AI capabilities have been exaggerated or false.

💡Evidence-based approach

An evidence-based approach is a method of decision-making that relies on empirical data and evidence. The speaker advocates for this approach in evaluating AI technologies, rejecting narratives promoted by those with a 'toxic culture of lying' and instead choosing to follow the evidence, as exemplified by the various benchmarks and research studies mentioned in the script.

Highlights

Discussion on the impact of large-language model AIs on the software development industry in the coming years.

Release of ChatGPT4o causing increased hype and divided opinions on its potential and impact.

Speculations about the future of AI, suggesting human-like robots at the price of luxury cars.

Debates on whether LMs are a passing trend or a transformative technology.

Evidence of AI's disruption in the job market, with companies like BP reportedly needing fewer programmers.

Debate on the actual impact of AI on programmer hiring and job security.

The unpredictability of AI's influence on software quality and the potential need for human intervention.

The importance of looking beyond current trends to understand AI's long-term impact.

Comparing the AI bubble to the tech startup bubble of '99 & 2000 and its aftermath.

The need to differentiate between hype and reality in AI development and its societal implications.

Carl's personal experience and disappointment with ChatGPT-4o's capabilities.

Critique of the overemphasis on non-essential features like voice interfaces in AI advancements.

The significance of evidence-based evaluation in assessing AI capabilities and progress.

The mixed performance of ChatGPT-4o on various benchmarks indicating no clear improvement.

The psychological predisposition of humans to attribute sentience to AI and its implications.

The concept of 'dark patterns' in AI interfaces designed to manipulate human perception.

Historical instances of companies exaggerating or faking AI capabilities in demos.

The call for skepticism and critical thinking in the face of AI hype and industry narratives.

The final cautionary advice to remain vigilant and discerning regarding AI's real capabilities and potential.

Transcripts

play00:00

So, I made a video early last month about how I

play00:02

thought the current trend of large-language

play00:04

model, AIs, would impact the software

play00:06

development industry over the next handful of

play00:09

years.

play00:10

Since that time, there's been a lot more

play00:12

discussion about that, a lot of it driven by

play00:14

the release

play00:14

of ChatGPT4o.

play00:16

Now, that release has caused a lot of people to

play00:19

buy even more into the hype.

play00:21

Some of those people are in the comments on

play00:23

this channel, some have even been for major

play00:24

news outlets.

play00:25

And that makes sense.

play00:27

The hype says that within a handful of years we

play00:29

will have human form factor robots with

play00:31

roughly human intelligence for what could be

play00:34

the current price of a luxury car.

play00:36

Or the haters say that LMs will soon be as dead

play00:40

as NFTs.

play00:41

As always, or at least almost always, the truth

play00:43

surely lies somewhere in between.

play00:45

But where?

play00:47

We know there's going to be disruption in the

play00:49

job market.

play00:49

Indeed, AIs already caused issues in the job

play00:52

market.

play00:52

I've talked about that several times.

play00:54

We're starting to see companies reporting that

play00:56

they're using fewer programmers due to

play00:58

AI, or holding off on hiring due to AI.

play01:01

For example, BP is saying they're now needing

play01:04

70% fewer coders-looks like contractors.

play01:07

And yet, the number of times "AI" was mentioned

play01:09

on earnings calls this past quarter was a

play01:12

fraction of the number of times it was

play01:13

mentioned in any of the previous four quarters.

play01:16

So which is it?

play01:17

Is it cutting into programmers a lot, or is it

play01:21

dying out?

play01:22

Right now we don't know.

play01:24

It's possible the staff reductions are only

play01:26

temporary, and if the code quality will be

play01:27

poor enough, that they'll need to hire more

play01:30

programmers to fix it.

play01:31

We won't know that for a while, and neither

play01:33

will those companies.

play01:34

Lots of bugs take a while to get noticed,

play01:36

especially when you're not actually trying

play01:38

to find them.

play01:40

And we don't know how the numbers of

play01:41

programmers reduced at companies like BP

play01:43

compared to the

play01:44

numbers of programmers that are being hired

play01:46

because of AI.

play01:46

So at the moment, while things are in flux, and

play01:48

companies are making bets, and we don't

play01:50

know which way the bets will turn out, what do

play01:53

we do?

play01:54

I say we try to look further ahead.

play01:56

We try to look at the fundamentals and try to

play01:58

guess how we think it's most likely to

play02:00

end up, if not when.

play02:02

If it's a bubble, it'll pop.

play02:04

They always do, although it's basically

play02:05

impossible to predict when they will pop.

play02:07

If it's not a bubble, and the hype is correct,

play02:10

it's going to be a huge societal disruption,

play02:12

and if that's the case, it's going to be really

play02:14

hard to plan for.

play02:15

Or both.

play02:16

Back in '99 & 2000, there was a bubble in tech

play02:18

startups, and it popped, and there was a

play02:20

drop in overall tech sector jobs that didn't

play02:23

start growing again until about 2004.

play02:25

But the underlying tech was real, and it was

play02:27

valuable, and it enabled the internet as

play02:29

we know it today.

play02:31

The promise of the internet back then took 15

play02:33

or years or so from the start of the bubble

play02:35

to really take off, and maybe that's the kind

play02:37

of time we were talking about.

play02:38

It's kind of hard to know yet.

play02:41

Which means we need to know what we can about

play02:43

whether this is a hype bubble and will pop

play02:44

soon, or a tech that's fundamentally disruptive.

play02:48

And where we have to start that investigation,

play02:50

because it's the best data point we have right

play02:52

now, is with ChatGPT-4o.

play02:55

And based on what I've seen so far, I think it's

play02:57

pretty clear that...

play03:07

This is the Internet of Bugs.

play03:08

My name is Carl, and this is originally going

play03:10

to be an explanation of what I thought about

play03:11

ChatGPT-4o and why.

play03:15

And it still is, but it ended up being a little

play03:17

more ranty at the end.

play03:19

Hopefully you'll stick with me till then.

play03:21

So to start with, ChatGPT-4o is much faster

play03:24

than I expected.

play03:25

It's very fast, and that's great.

play03:27

Aside from that, though, I'm not really

play03:29

impressed.

play03:30

The thing that everyone seems excited about is

play03:32

the voice interface, and I think that most

play03:33

of the people excited about that haven't really

play03:35

been paying attention.

play03:37

As I explained in my AI agent video, before 4o

play03:40

came out, we had voice and camera graphics

play03:42

interface to chat GPT and other LLMs for a long

play03:45

time now.

play03:46

Below, I've linked instructions on hooking chat

play03:49

GPT up to a voice interface.

play03:51

From February of last year. The functionality

play03:53

works out of the box now, and that's great.

play03:55

It's very convenient, and it's faster.

play03:57

But it's nothing new, and it's not really an

play03:59

advancement.

play04:00

Doing tasks cheaper and faster isn't nearly as

play04:02

disruptive if the tasks aren't difficult

play04:04

or they aren't done well or they aren't done

play04:07

correctly.

play04:07

And doing a wide variety of tasks correctly

play04:10

determines how autonomous an AI can be, and

play04:12

therefore how many and what kinds of jobs it

play04:14

might be able to displace, and most relevant

play04:16

to my expertise, how it might impact software

play04:19

jobs, and how the jobs that have already been

play04:22

impacted are likely to stay impacted or likely

play04:26

to come back.

play04:28

So the next question is, what do we know about

play04:30

4o doing tasks better or more correctly?

play04:33

My personal experiences with it have been

play04:35

disappointing.

play04:36

In my last video, I talked about how a perplexity.ai

play04:39

found a Steve Jobs quote from 1983 for me,

play04:41

but 4o told me it didn't exist.

play04:43

And for the record, Google couldn't find it

play04:45

either, even if I gave Google the exact

play04:46

quote to search for.

play04:48

But I started my career as a physicist, and

play04:50

physics and science is all about experiments

play04:52

and evidence.

play04:53

And that's really served me well over the years,

play04:55

believe it or not, both in being able

play04:56

to separate hype from reality.

play04:58

But also, when I'm in my day-to-day work and I'm

play05:00

troubleshooting or debugging, using

play05:02

evidence to actually narrow down what the real

play05:04

problem is and finding and fixing it and

play05:06

figuring out what experiments to run to make

play05:08

the bug show up, has been incredibly useful.

play05:12

I do my best to gather the evidence I can find.

play05:14

I look at the evidence I've gathered, I draw

play05:16

conclusions.

play05:17

If I see new evidence, I'll revise the

play05:19

conclusions.

play05:20

This is a pretty much constant process.

play05:24

It has been pretty much my whole career.

play05:26

When I hear news stories or read articles that

play05:27

are relevant to technology trends, I

play05:29

see if it's information that changes my guesses

play05:31

about what's popular or what's becoming

play05:32

more or less important.

play05:33

Even before I started this channel, I did this.

play05:35

It's important for me to decide what I'm going

play05:37

to spend my time on, and being frank,

play05:39

if that's not something that sounds like you'd

play05:41

want to be keeping up with for the rest of

play05:42

your career, then maybe a technology career is

play05:45

not for you.

play05:46

So briefly, and so I can point people to this

play05:48

section of video when they spout garbage

play05:50

in my comments, what counts as evidence?

play05:53

One, researchers publishing peer-reviewed

play05:55

papers are the gold standard of evidence.

play05:57

They aren't always correct in hindsight, people

play05:59

make mistakes, some falsify even data.

play06:03

But that's as good as evidence ever gets, and

play06:05

the papers that are wrong eventually get

play06:06

found out.

play06:08

Two, benchmarks are evidence.

play06:10

In fact, that's the whole point of benchmarks.

play06:13

Benchmarks are generally what are used in most

play06:14

of the peer-reviewed papers that we have

play06:16

about LLM performance.

play06:18

Some people in the comments have tried to argue

play06:20

that benchmarks aren't evidence or don't

play06:21

count as evidence or they can't be taking it

play06:24

face value for some reason.

play06:25

What I can tell you is lots of researchers use

play06:27

those benchmarks as evidence in paper

play06:29

after paper, many of those papers are linked in

play06:31

this video, and as long as they keep doing

play06:33

that, so will I, and I won't care about your

play06:36

opinion.

play06:37

Three, firsthand observation of facts, not

play06:39

speculation, not things that might happen

play06:42

in the future, but things that have actually

play06:45

happened count as evidence if they come from

play06:46

unbiased sources that have relevant experience.

play06:50

And lastly, and least usefully, trends actually

play06:53

are evidence. It's pretty crappy evidence,

play06:56

but when it comes to predicting the future,

play06:58

that's often all we've got.

play06:59

But it only counts if the circumstances in the

play07:01

future are the same as the circumstances

play07:04

in the past, but if the circumstances might be

play07:06

materially different, then I don't trust

play07:08

the trend will continue, and I don't think that

play07:10

you should either, and I don't count

play07:11

that as evidence.

play07:12

So to be explicit, a commenter on a YouTube

play07:16

video's opinion on the future doesn't count

play07:19

unless it's backed up with sources.

play07:22

A commenter's opinion about a video or an

play07:23

article doesn't count unless they back it

play07:25

up with sources or facts.

play07:27

But the same is true of me.

play07:28

I try to back up the information I give you and

play07:30

the conclusions I draw with sources.

play07:32

My video descriptions usually have lots of them.

play07:35

When I talk about things I'm an expert in, I

play07:37

try to give examples and sources and citations

play07:40

and further reading, and I try to explain what

play07:42

scenarios and experiences I've seen that

play07:44

lead me to draw the conclusions that I have.

play07:46

At the time of this recording, there are close

play07:48

to 60 URLs in the list of references for this

play07:50

video.

play07:52

So back to chat GPT-4o.

play07:54

The consensus is it's way faster.

play07:57

I've been told it's much better at many non-English

play07:59

languages, although I'm not an expert in that.

play08:02

It interacts with sound and images without

play08:03

having to have pre-processors and post-processors

play08:05

to convert, and that's cool and convenient.

play08:07

But when it comes to accuracy, it's sometimes

play08:09

better and sometimes worse.

play08:11

So MMLU is 2.2% better than 4-Turbo, a GPQA,

play08:16

which is a science benchmark is 5.6% better,

play08:20

but on the DROP benchmark, which is complex

play08:22

reasoning and math, it's 2.6% worse than

play08:25

4-Turbo.

play08:26

And there's a new benchmark called SEAL, at

play08:28

which 4o is actually worse than 4-Turbo.

play08:32

It's a very promising new benchmark, now don't

play08:34

take my word for it.

play08:35

Here's a link to a tweet from Andrej Karpathy,

play08:38

who's the former Tesla director of AI.

play08:40

I hope I got that name even closer right.

play08:43

He's an OpenAI founding team member.

play08:45

I talked about him in my last video.

play08:47

There were some complaints in the comments in

play08:49

my earlier videos about some benchmarks

play08:50

that I graphed that were on a scale of 0 to 100%.

play08:53

Seal shows the same kinds of trends, but doesn't

play08:56

have that limitation.

play08:57

And now that it is this paper about how chat GPT's

play08:59

behavior is changing over time, note

play09:00

that this paper only compares 3.5 and 4.

play09:05

And so I'm looking forward to when they add 4o

play09:08

to it, but according to this, there's

play09:11

actually not a lot of improvement between 3.5

play09:15

and 4 even.

play09:17

So you can choose to believe that 4o is better

play09:19

if you want, but the evidence is mixed.

play09:21

So it can't be dramatically better, at least

play09:23

not in the realm of how well it's able to

play09:24

correctly perform tasks.

play09:26

And according to some research, on some

play09:28

benchmarks, there hasn't been a lot of

play09:29

improvement since

play09:30

3.5.

play09:31

So if that's the evidence we have about 4o,

play09:34

what does it tell us about the future?

play09:36

So this graphic, I guess this counts as

play09:40

evidence, it counts as a fact, but it's

play09:42

evidence of

play09:43

the fact that they're going to be spending a

play09:46

lot of money training GPT-5.

play09:48

It doesn't actually tell us anything about how

play09:50

much better or worse it's going to be

play09:53

at what tasks it can do and how well it can do

play09:54

them, despite what people on the internet

play09:57

tell you.

play09:58

We know that there's a trend that the models

play10:00

have been getting exponentially faster and

play10:01

cheaper for years, and that seems to have

play10:03

continued through 4o, so it's likely to

play10:05

continue for

play10:06

some time, and that's good for lots of reasons.

play10:08

But when it comes to tasks and correctness, we

play10:11

know there's mixed evidence of it.

play10:13

And so we don't really know how to extrapolate

play10:17

from that.

play10:18

So the best method of predicting the future is

play10:20

about extrapolating from trends what other

play10:22

trends are there that we can draw from.

play10:23

It turns out there are two important ones, and

play10:25

they're not actually discussed very much,

play10:27

which is interesting.

play10:28

The first thing we don't talk about much is a

play10:30

bunch of experts who have been doing relevant

play10:31

research for a very long time, but very few

play10:34

people in the tech industry have paid any

play10:36

attention to them at all because they're not

play10:38

part of the tech industry.

play10:39

These people are psychologists and ethicists

play10:41

and philosophers who study not what neural

play10:43

networks can do, but why humans are predisposed

play10:46

to be particularly gullible about the sentience

play10:49

of AI.

play10:50

It turns out that the human brain seems to be

play10:52

hardwired to believe that things that we

play10:54

interact with have thoughts and feelings and

play10:56

desires and such and such.

play10:58

I'll put links down below.

play10:59

There's a very long history of people attributing

play11:01

thoughts and wants and desires to everything

play11:03

from weather and nature to tools to pet rocks.

play11:08

And not only do we naturally tend to believe

play11:09

that non-human things have thoughts and

play11:11

feelings,

play11:12

but LLMs have been getting attributes that make

play11:15

us even more likely to believe it.

play11:17

There's a thing called the "Eliza Effect," which

play11:19

has been known since the 70s, where

play11:21

a chat interface causes "powerful delusional

play11:24

thinking" in humans that has been likened to

play11:26

a "slow-acting poison."

play11:28

So there's a term that has been coined by

play11:30

researchers that's called "dark patterns."

play11:34

It describes user interfaces in products that

play11:38

trick people or trick the brain into behaving

play11:41

ways that the producer of the product wants

play11:44

that is contrary to the interests of the

play11:47

consumer

play11:47

or the user.

play11:48

Dark patterns are a huge topic, and researchers

play11:51

are just starting to study the dark patterns

play11:53

in LLM chatbots, paper below.

play11:56

But two recent other papers on dark patterns

play11:58

are particularly relevant to chat GPT4o.

play12:01

One of them is about how synthetic voices

play12:03

impact human decision-making, and the other

play12:05

is about how cuteness is a dark pattern.

play12:08

Does the fact that 4o has not only synthetic

play12:10

voices, but cute affects like giggling?

play12:12

Does that influence what people think of 4o?

play12:16

That specific research hasn't been done yet,

play12:17

but based on past findings, I expect it's

play12:19

going to be very revealing when it happens.

play12:22

But we know that, intentionally or not, LLM

play12:24

chatbots have been designed, in a way known

play12:26

by psychologists and ethicists, to trick humans

play12:29

into believing that they're intelligent.

play12:31

And that trend is getting worse, and shows no

play12:34

indication of getting better.

play12:35

One more trend that's relevant to all this, it's

play12:37

been going on for at least eight years

play12:38

with respect to AI specifically, and a whole

play12:41

lot longer in general.

play12:43

So for once, I'm only going to go back as far

play12:45

as 2016 for brevity's sake.

play12:47

I could go back farther, but it would be less

play12:49

relevant.

play12:50

So last year, we found out in a lawsuit

play12:52

deposition that Tesla faked a self-driving demo

play12:55

in 2016.

play12:57

The Independent, which is an outlet in the UK,

play13:00

recreated a Tesla demo in 2022 and found

play13:03

that it actually crashed right into a cutout of

play13:06

a child, as opposed to what the Tesla demo

play13:09

did.

play13:10

Okay, enough of Tesla to talk about Google for

play13:13

a minute.

play13:14

So in May of 2018, Google famously faked its

play13:18

Duplex AI demo. Oh, and Google faked their

play13:21

Gemini AI demo in December of 2023, so they didn't

play13:26

learn much.

play13:28

Recently Google claimed that their DeepMind

play13:30

created 2.2 million new materials, but actual

play13:32

researchers said quote: "We examine the claims

play13:34

of this work, and unfortunately find scant

play13:37

evidence for compounds that fulfill the trifecta

play13:39

of novelty, credibility, and utility."

play13:42

In other words, very few of the 2.2 million

play13:44

compounds that Google claimed are of any

play13:47

use or haven't already been discovered.

play13:51

A Google VP recently released a thing that said

play13:53

quote, "In addition to designing AI

play13:55

overviews to optimize for accuracy, we tested

play13:57

the feature extensively before launch.

play13:59

This included robust red teaming efforts,

play14:01

evaluations with samples of typical user

play14:03

queries,

play14:04

and tests on proportion of search traffic to

play14:07

see how it performed."

play14:08

Even yet, Google AI said that we should "eat at

play14:11

least one small rock per day" and "add about

play14:13

an eighth of a cup of glue to pizza sauce."

play14:17

So we can't really trust Google.

play14:18

Let's talk about Amazon.

play14:20

So Amazon's AI checkout technology, the "Just

play14:23

Walk Out" technology, turned out to be thousands

play14:25

of remote workers in India instead of an actual

play14:28

AI.

play14:29

Ironically, that was done via Amazon's online

play14:31

task platform that's called "Mechanical Turk,"

play14:34

which is named after a famous fraud from the 1770s

play14:37

where a chess playing robot just turned

play14:39

out to be a man hiding in a box pulling its

play14:42

strings.

play14:42

Usually, I don't go back to the 1770s, but I

play14:45

guess it happens sometimes.

play14:47

And there have been a bunch of times when

play14:49

something that was said to be AI just turned

play14:51

out to be a bunch of remote people doing the

play14:54

work.

play14:54

So GM's cruise self-driving car technology used

play14:58

1.5 remote humans for every vehicle on

play15:01

the road to, quote, "remotely control the car

play15:03

after receiving a cellular signal that

play15:05

it was having problems."

play15:08

Facebook had a serial competitor called "M" and

play15:11

it was supposedly supervised by humans.

play15:14

Reportedly, though, quote, "in practice, over

play15:17

70% of requests were answered by human

play15:19

operators."

play15:20

It was shut down in 2018, supposedly it was

play15:23

very expensive.

play15:25

The SEC has recently started charging companies

play15:28

with what they call "AI washing," which is

play15:30

when companies say that they're doing things

play15:33

with AI, when there's actually no AI involved.

play15:36

And then there's things that do use AI, but

play15:38

that companies insist on lying about how

play15:40

well it does the AI.

play15:43

So on January 23, 2024, Microsoft said, quote,

play15:47

"Microsoft aims to provide transparency in

play15:49

how its AI systems operate, allowing users and

play15:51

stakeholders to understand the decision-making

play15:53

processes and outcomes."

play15:55

48 hours after that, Microsoft's products were

play15:59

used to make viral, deep-fake, non-consensual

play16:02

porn of Taylor Swift.

play16:06

After the Taylor Swift deep-fake porn went

play16:07

viral, it was reported, quote, "A Microsoft

play16:10

AI engineering leader says he discovered

play16:12

vulnerabilities in the OpenAI's DALL-E 3 image

play16:14

generator in early December, allowing users to

play16:17

bypass safety guardrails to create violent

play16:20

and explicit images, and that the company impeded

play16:23

his previous attempt to bring public

play16:25

attention to the issue."

play16:27

So much for transparency.

play16:29

The Rabbit R1 demo showed a bunch of things

play16:33

that reviewers said just don't work, and

play16:36

it turned out to be just an Android app.

play16:39

The humane AI pin gave false information in its

play16:42

demo video, and the company quietly

play16:44

re-edited the demo with new audio to make it

play16:47

look like it gave the right answers.

play16:50

The AI pin was famously called "the worst

play16:52

product" a particular reviewer had ever

play16:55

reviewed.

play16:56

What about OpenAI, though?

play16:57

Have they been caught lying about demos?

play17:00

Quote, "Perhaps the most widely touted of GPT-4's

play17:03

at-launch zero-shot capabilities has

play17:05

been it reported 90th percentile performance on

play17:08

the uniform bar exam."

play17:09

Well, new reports say it was actually in the 15th

play17:12

percentile of those that took the test

play17:14

for the first time.

play17:15

Turns out that what OpenAI seems to have done

play17:18

was arrange it so that their AI got compared

play17:21

to a bunch of people that had taken the test

play17:23

before and failed the test before, and were

play17:25

really likely to fail it again, and it got

play17:27

better than 90% of people that were likely

play17:30

to fail it again.

play17:33

At least one of OpenAI's Sora demo videos was

play17:36

done by an FX group called Shy Kids.

play17:39

Link below for that.

play17:41

There was a recent interview with an former

play17:43

OpenAI board member who said that Sam Altman

play17:45

had created, quote, "a toxic culture of lying."

play17:50

Oh, and then moving on from OpenAI, there was

play17:54

this thing called Devin. The company that

play17:57

made Devin kind of lied about that.

play17:59

There is a video about that, you might have

play18:02

heard of it.

play18:03

And there were other examples I didn't put in

play18:04

this list because they were less directly

play18:06

related to AI. And keep in mind that's just the

play18:10

companies that have been both caught,

play18:12

and that were high enough profile to actually

play18:15

make headlines and get reported about.

play18:17

I've been part of many software demos and many

play18:20

product launches over the last 35 years,

play18:23

and in my professional opinion, in my

play18:25

experience, LOTS of demos lie.

play18:28

There's no way to know for sure, but based on

play18:30

my past experience, I'd guess that maybe

play18:32

for every demo that gets exposed, at least a

play18:34

couple of demos get away with it, maybe

play18:36

more.

play18:37

And yet we have people insisting that this is

play18:40

all real, a WIRED.com article called "It's

play18:43

Time to Believe the AI Hype" came out on the 17th

play18:47

of May, just after the chat GPT-4o demo.

play18:50

It tried out the same old joke about a friend

play18:52

that takes another friend to a comedy club

play18:54

to see a dog telling jokes, and the first

play18:57

friend says, "What do you think?" and the

play18:59

second friend says, "Well, the jokes bombed.

play19:02

I'm not that impressed."

play19:03

The article then says folks, "When dogs talk,

play19:06

we're talking biblical disruption."

play19:09

And maybe, but let me ask you this.

play19:12

If you're confronted with a talking dog at a

play19:14

comedy club, what's more likely?

play19:16

That there's a dog that's actually talking, or

play19:19

that, like Amazon Fresh, what's supposed

play19:21

to be an unprecedented breakthrough is really

play19:23

just a bunch of people behind the scenes trying

play19:25

to trick you?

play19:26

I know what I'd bet on.

play19:28

And to top it off, the concluding paragraph of

play19:31

that article insists, quote-This is a

play19:33

direct quote- "But the demos aren't lying."

play19:35

Except for, you know, TESLA self-driving and

play19:38

GM's Cruise self-driving and Google Duplex and

play19:40

Google Gemini, and Facebook's "M" Chatbot and

play19:42

OpenAI's Sora and Chat GPT-4's Bar exam, and

play19:45

Amazon Fresh's "Just Walk Out" and Rabbit

play19:47

R1 and HumanAI's Pin. Oh, and Devin again.

play19:52

Deep breath.

play19:54

So, there's no clear evidence of accuracy and

play19:58

task performance getting better.

play20:00

There is clear evidence that the features being

play20:02

added to these products are attributes

play20:03

like voices that are known to my humans more

play20:05

likely to believe that the products actually

play20:07

think.

play20:09

There is clear evidence that the companies have

play20:10

been lying about AI's capabilities for

play20:12

years, not all the companies, but many of them.

play20:15

And there's clear evidence that journalists,

play20:18

again, not all of them, but many of them,

play20:20

have-and will continue to make-irresponsible

play20:22

statements like "The demos aren't lying," all

play20:25

evidence to the contrary.

play20:27

But we have to make decisions about what we're

play20:29

going to spend our time on.

play20:31

And we have to decide what we're going to learn

play20:33

and what we're going to avoid and whether

play20:35

we want to switch majors or switch careers or

play20:38

what.

play20:38

What's going to happen with AI is one of the

play20:40

bigger questions and software careers right

play20:41

now.

play20:42

In tech. Maybe even in the world, and I'd

play20:45

really love to know what's going to happen.

play20:48

And if I did, I would tell you, but I don't.

play20:51

And honestly, nobody does.

play20:55

What I know is that there is currently no clear

play20:57

evidence that we're going to get human-level

play21:00

artificial intelligence anytime soon.

play21:04

And what we really know for definite sure is

play21:06

that many of the people that are telling

play21:08

us how great it's going to be, not only have a

play21:11

financial incentive to lie about that, they've

play21:13

been lying about it for years and have been

play21:16

lying so obviously about it that they've been

play21:18

caught lying about it over and over.

play21:22

And that many people we should be able to trust,

play21:25

like journalists who should know better,

play21:27

keep saying things like "the demos aren't lying,"

play21:30

even though many, many of the AI demos have

play21:33

been lying and have been proven to be lying

play21:36

over and over since at least 2016.

play21:39

So we're on our own and we have to make up our

play21:42

own minds.

play21:43

And like generations of scientists going back

play21:45

to at least the 16th century, I'm going

play21:47

to choose to follow the evidence, and I'm going

play21:49

to choose to reject the narrative that's

play21:51

being promoted by the people who have "A Toxic

play21:54

Culture of Lying."

play21:56

But hey, you do you.

play21:59

Never forget: the Internet is full of bugs, and

play22:01

anyone who says differently probably thinks

play22:02

you're so stupid that you would believe that

play22:05

dogs can actually tell jokes.

play22:06

Let's be careful out there.

Rate This

5.0 / 5 (0 votes)

Étiquettes Connexes
AI ImpactJob MarketAI HypeSoftware DevelopmentTech DisruptionAI AccuracyEthics in AIHuman-AI InteractionTech TrendsAI Skepticism
Besoin d'un résumé en anglais ?