AI Hype is completely out of control - especially since ChatGPT-4o
Summary
TLDRThe video script discusses skepticism around large-language model AIs' impact on software development. It critiques the hype following the release of ChatGPT-4o, questioning job market disruption and AI's actual capabilities. The speaker, Carl, uses evidence-based reasoning to argue that despite speed improvements, there's mixed evidence on AI's accuracy. He highlights the history of companies overstating AI capabilities and the psychological tendency of humans to anthropomorphize AI, suggesting a 'toxic culture of lying' around AI demonstrations.
Takeaways
- 🤖 The video discusses the impact of large-language model AIs on the software development industry and the mixed reactions to the release of ChatGPT-4o.
- 📈 There is a debate on whether AIs will replace programmers significantly or if the current AI advancements are a passing trend.
- 🔍 Companies like BP are reportedly using fewer programmers due to AI, but the overall industry impact is still unclear.
- 👨💼 The speaker, Carl, emphasizes the importance of evidence-based analysis in understanding AI's role in the job market and its potential future.
- 🔬 Carl's background in physics has shaped his approach to evaluating technology trends through experimentation and evidence gathering.
- 📊 The script mentions various benchmarks that show mixed improvements in AI capabilities, suggesting that AI's ability to perform tasks correctly is not consistently better.
- 🗣️ Voice interfaces are highlighted as a feature of ChatGPT-4o, but Carl argues that this is not a new advancement and does not significantly impact AI's capabilities.
- 🤔 The video raises questions about the trustworthiness of AI demonstrations and the history of companies overstating AI capabilities.
- 🕊️ The 'Eliza Effect' and 'dark patterns' in AI chatbots are discussed as psychological tricks that make humans more likely to believe in AI sentience.
- 📉 Carl points out a trend of companies being caught lying about AI capabilities, which undermines confidence in current and future AI advancements.
- 🧐 The video concludes by urging viewers to critically assess the evidence and be wary of narratives promoted by those with a history of dishonesty.
Q & A
What is the main topic of discussion in the video script?
-The main topic of the video script is the impact of large-language model AIs, particularly ChatGPT-4o, on the software development industry and the validity of the hype surrounding AI capabilities.
What is the current trend in the job market regarding AI and programmers?
-The current trend indicates that AI is causing some job disruptions, with companies like BP reporting a significant reduction in the number of programmers needed, possibly due to AI advancements.
What does the speaker suggest about the hype around AI and its potential impact on society?
-The speaker suggests that the hype around AI might be exaggerated and that the truth likely lies somewhere in between the extreme views of AI replacing human jobs entirely or being as short-lived as NFTs.
What evidence does the speaker consider reliable for evaluating AI capabilities?
-The speaker considers peer-reviewed papers, benchmarks, firsthand observations from unbiased sources, and trends under similar circumstances as reliable evidence for evaluating AI capabilities.
What is the 'Eliza Effect' mentioned in the script?
-The 'Eliza Effect' refers to the phenomenon where humans are predisposed to believe that AI chatbots have thoughts and feelings, leading to 'powerful delusional thinking' akin to a 'slow-acting poison'.
What is the speaker's opinion on the voice interface feature of ChatGPT-4o?
-The speaker is not impressed by the voice interface feature of ChatGPT-4o, stating that it is not new and has been available for some time, and that it does not necessarily represent an advancement in AI.
What is the term used to describe user interfaces that trick people into certain behaviors?
-The term used to describe such user interfaces is 'dark patterns'.
What are some examples of companies that have been caught exaggerating or lying about their AI capabilities?
-Examples include Tesla with its self-driving demo, Google with Duplex and Gemini AI demos, and OpenAI with the GPT-4 bar exam performance claims.
What is the speaker's stance on the future of AI and its potential to achieve human-level intelligence?
-The speaker is skeptical about the near-future prospects of achieving human-level AI, citing a lack of clear evidence and a history of companies exaggerating AI capabilities.
What advice does the speaker give to those trying to understand the impact of AI on their careers or industries?
-The speaker advises individuals to make up their own minds, follow the evidence, and be cautious of narratives promoted by those with a history of dishonesty.
Outlines
🤖 AI Hype and Its Impact on the Job Market
The script discusses the mixed reactions to the impact of large-language model AIs like ChatGPT-4o on the software development industry. It highlights the uncertainty in the job market, with companies like BP reportedly needing fewer programmers due to AI, while the hype suggests a future with human-like AI at a luxury car's price. The author emphasizes the need to look beyond the current hype and consider the fundamentals to predict the future of AI's role in society and its potential for disruption. The paragraph also touches on the importance of evidence-based analysis in understanding technological trends.
🔍 Evidence-Based Analysis of AI's Progress
The speaker advocates for an evidence-based approach to evaluating AI's capabilities and progress, critiquing the hype and speculation that often cloud discussions about AI. They define different levels of evidence, from peer-reviewed papers to firsthand observations, and stress the importance of benchmarks in assessing AI performance. The paragraph also addresses the mixed results from benchmarks on ChatGPT-4o, indicating that while it may be faster, its accuracy varies across different tasks, and there is no clear consensus on its overall improvement from previous models.
🧐 The Human倾向 to Anthropomorphize AI
This section delves into the psychological and ethical aspects of AI, discussing how humans are predisposed to attribute sentience to AI based on their interactions. It mentions the 'Eliza Effect' and the concept of 'dark patterns' in AI chatbots, which are designed to manipulate users into perceiving intelligence. The paragraph also points out that companies have been known to exaggerate or even fabricate AI capabilities in demos, leading to a culture of dishonesty that affects public perception and trust in AI technology.
📉 The Reality of AI Demos and Company诚信
The script provides numerous examples of companies that have been caught lying about the capabilities of their AI systems in demos. It discusses the SEC's crackdown on 'AI washing' and instances where companies have presented human-operated services as AI-driven. The paragraph calls into question the transparency and honesty of companies and journalists in presenting AI advancements, suggesting a pervasive issue with exaggerated claims and dishonesty in the industry.
🚧 The Future of AI: Speculation vs. Evidence
In the concluding paragraph, the author reflects on the uncertainty surrounding the future of AI and the importance of relying on evidence rather than speculation or the narratives pushed by those with a vested interest in promoting AI. They express skepticism about the imminent arrival of human-level AI and criticize the dishonesty that has been prevalent in the industry. The speaker encourages individuals to make informed decisions based on evidence and to be wary of the potential for manipulation and misinformation in the discourse surrounding AI.
Mindmap
Keywords
💡Large-language model AIs
💡Software development industry
💡Hype
💡Disruption
💡AI washing
💡Benchmarks
💡Dark patterns
💡Ethics
💡Accuracy
💡Human-level artificial intelligence
💡Evidence-based approach
Highlights
Discussion on the impact of large-language model AIs on the software development industry in the coming years.
Release of ChatGPT4o causing increased hype and divided opinions on its potential and impact.
Speculations about the future of AI, suggesting human-like robots at the price of luxury cars.
Debates on whether LMs are a passing trend or a transformative technology.
Evidence of AI's disruption in the job market, with companies like BP reportedly needing fewer programmers.
Debate on the actual impact of AI on programmer hiring and job security.
The unpredictability of AI's influence on software quality and the potential need for human intervention.
The importance of looking beyond current trends to understand AI's long-term impact.
Comparing the AI bubble to the tech startup bubble of '99 & 2000 and its aftermath.
The need to differentiate between hype and reality in AI development and its societal implications.
Carl's personal experience and disappointment with ChatGPT-4o's capabilities.
Critique of the overemphasis on non-essential features like voice interfaces in AI advancements.
The significance of evidence-based evaluation in assessing AI capabilities and progress.
The mixed performance of ChatGPT-4o on various benchmarks indicating no clear improvement.
The psychological predisposition of humans to attribute sentience to AI and its implications.
The concept of 'dark patterns' in AI interfaces designed to manipulate human perception.
Historical instances of companies exaggerating or faking AI capabilities in demos.
The call for skepticism and critical thinking in the face of AI hype and industry narratives.
The final cautionary advice to remain vigilant and discerning regarding AI's real capabilities and potential.
Transcripts
So, I made a video early last month about how I
thought the current trend of large-language
model, AIs, would impact the software
development industry over the next handful of
years.
Since that time, there's been a lot more
discussion about that, a lot of it driven by
the release
of ChatGPT4o.
Now, that release has caused a lot of people to
buy even more into the hype.
Some of those people are in the comments on
this channel, some have even been for major
news outlets.
And that makes sense.
The hype says that within a handful of years we
will have human form factor robots with
roughly human intelligence for what could be
the current price of a luxury car.
Or the haters say that LMs will soon be as dead
as NFTs.
As always, or at least almost always, the truth
surely lies somewhere in between.
But where?
We know there's going to be disruption in the
job market.
Indeed, AIs already caused issues in the job
market.
I've talked about that several times.
We're starting to see companies reporting that
they're using fewer programmers due to
AI, or holding off on hiring due to AI.
For example, BP is saying they're now needing
70% fewer coders-looks like contractors.
And yet, the number of times "AI" was mentioned
on earnings calls this past quarter was a
fraction of the number of times it was
mentioned in any of the previous four quarters.
So which is it?
Is it cutting into programmers a lot, or is it
dying out?
Right now we don't know.
It's possible the staff reductions are only
temporary, and if the code quality will be
poor enough, that they'll need to hire more
programmers to fix it.
We won't know that for a while, and neither
will those companies.
Lots of bugs take a while to get noticed,
especially when you're not actually trying
to find them.
And we don't know how the numbers of
programmers reduced at companies like BP
compared to the
numbers of programmers that are being hired
because of AI.
So at the moment, while things are in flux, and
companies are making bets, and we don't
know which way the bets will turn out, what do
we do?
I say we try to look further ahead.
We try to look at the fundamentals and try to
guess how we think it's most likely to
end up, if not when.
If it's a bubble, it'll pop.
They always do, although it's basically
impossible to predict when they will pop.
If it's not a bubble, and the hype is correct,
it's going to be a huge societal disruption,
and if that's the case, it's going to be really
hard to plan for.
Or both.
Back in '99 & 2000, there was a bubble in tech
startups, and it popped, and there was a
drop in overall tech sector jobs that didn't
start growing again until about 2004.
But the underlying tech was real, and it was
valuable, and it enabled the internet as
we know it today.
The promise of the internet back then took 15
or years or so from the start of the bubble
to really take off, and maybe that's the kind
of time we were talking about.
It's kind of hard to know yet.
Which means we need to know what we can about
whether this is a hype bubble and will pop
soon, or a tech that's fundamentally disruptive.
And where we have to start that investigation,
because it's the best data point we have right
now, is with ChatGPT-4o.
And based on what I've seen so far, I think it's
pretty clear that...
This is the Internet of Bugs.
My name is Carl, and this is originally going
to be an explanation of what I thought about
ChatGPT-4o and why.
And it still is, but it ended up being a little
more ranty at the end.
Hopefully you'll stick with me till then.
So to start with, ChatGPT-4o is much faster
than I expected.
It's very fast, and that's great.
Aside from that, though, I'm not really
impressed.
The thing that everyone seems excited about is
the voice interface, and I think that most
of the people excited about that haven't really
been paying attention.
As I explained in my AI agent video, before 4o
came out, we had voice and camera graphics
interface to chat GPT and other LLMs for a long
time now.
Below, I've linked instructions on hooking chat
GPT up to a voice interface.
From February of last year. The functionality
works out of the box now, and that's great.
It's very convenient, and it's faster.
But it's nothing new, and it's not really an
advancement.
Doing tasks cheaper and faster isn't nearly as
disruptive if the tasks aren't difficult
or they aren't done well or they aren't done
correctly.
And doing a wide variety of tasks correctly
determines how autonomous an AI can be, and
therefore how many and what kinds of jobs it
might be able to displace, and most relevant
to my expertise, how it might impact software
jobs, and how the jobs that have already been
impacted are likely to stay impacted or likely
to come back.
So the next question is, what do we know about
4o doing tasks better or more correctly?
My personal experiences with it have been
disappointing.
In my last video, I talked about how a perplexity.ai
found a Steve Jobs quote from 1983 for me,
but 4o told me it didn't exist.
And for the record, Google couldn't find it
either, even if I gave Google the exact
quote to search for.
But I started my career as a physicist, and
physics and science is all about experiments
and evidence.
And that's really served me well over the years,
believe it or not, both in being able
to separate hype from reality.
But also, when I'm in my day-to-day work and I'm
troubleshooting or debugging, using
evidence to actually narrow down what the real
problem is and finding and fixing it and
figuring out what experiments to run to make
the bug show up, has been incredibly useful.
I do my best to gather the evidence I can find.
I look at the evidence I've gathered, I draw
conclusions.
If I see new evidence, I'll revise the
conclusions.
This is a pretty much constant process.
It has been pretty much my whole career.
When I hear news stories or read articles that
are relevant to technology trends, I
see if it's information that changes my guesses
about what's popular or what's becoming
more or less important.
Even before I started this channel, I did this.
It's important for me to decide what I'm going
to spend my time on, and being frank,
if that's not something that sounds like you'd
want to be keeping up with for the rest of
your career, then maybe a technology career is
not for you.
So briefly, and so I can point people to this
section of video when they spout garbage
in my comments, what counts as evidence?
One, researchers publishing peer-reviewed
papers are the gold standard of evidence.
They aren't always correct in hindsight, people
make mistakes, some falsify even data.
But that's as good as evidence ever gets, and
the papers that are wrong eventually get
found out.
Two, benchmarks are evidence.
In fact, that's the whole point of benchmarks.
Benchmarks are generally what are used in most
of the peer-reviewed papers that we have
about LLM performance.
Some people in the comments have tried to argue
that benchmarks aren't evidence or don't
count as evidence or they can't be taking it
face value for some reason.
What I can tell you is lots of researchers use
those benchmarks as evidence in paper
after paper, many of those papers are linked in
this video, and as long as they keep doing
that, so will I, and I won't care about your
opinion.
Three, firsthand observation of facts, not
speculation, not things that might happen
in the future, but things that have actually
happened count as evidence if they come from
unbiased sources that have relevant experience.
And lastly, and least usefully, trends actually
are evidence. It's pretty crappy evidence,
but when it comes to predicting the future,
that's often all we've got.
But it only counts if the circumstances in the
future are the same as the circumstances
in the past, but if the circumstances might be
materially different, then I don't trust
the trend will continue, and I don't think that
you should either, and I don't count
that as evidence.
So to be explicit, a commenter on a YouTube
video's opinion on the future doesn't count
unless it's backed up with sources.
A commenter's opinion about a video or an
article doesn't count unless they back it
up with sources or facts.
But the same is true of me.
I try to back up the information I give you and
the conclusions I draw with sources.
My video descriptions usually have lots of them.
When I talk about things I'm an expert in, I
try to give examples and sources and citations
and further reading, and I try to explain what
scenarios and experiences I've seen that
lead me to draw the conclusions that I have.
At the time of this recording, there are close
to 60 URLs in the list of references for this
video.
So back to chat GPT-4o.
The consensus is it's way faster.
I've been told it's much better at many non-English
languages, although I'm not an expert in that.
It interacts with sound and images without
having to have pre-processors and post-processors
to convert, and that's cool and convenient.
But when it comes to accuracy, it's sometimes
better and sometimes worse.
So MMLU is 2.2% better than 4-Turbo, a GPQA,
which is a science benchmark is 5.6% better,
but on the DROP benchmark, which is complex
reasoning and math, it's 2.6% worse than
4-Turbo.
And there's a new benchmark called SEAL, at
which 4o is actually worse than 4-Turbo.
It's a very promising new benchmark, now don't
take my word for it.
Here's a link to a tweet from Andrej Karpathy,
who's the former Tesla director of AI.
I hope I got that name even closer right.
He's an OpenAI founding team member.
I talked about him in my last video.
There were some complaints in the comments in
my earlier videos about some benchmarks
that I graphed that were on a scale of 0 to 100%.
Seal shows the same kinds of trends, but doesn't
have that limitation.
And now that it is this paper about how chat GPT's
behavior is changing over time, note
that this paper only compares 3.5 and 4.
And so I'm looking forward to when they add 4o
to it, but according to this, there's
actually not a lot of improvement between 3.5
and 4 even.
So you can choose to believe that 4o is better
if you want, but the evidence is mixed.
So it can't be dramatically better, at least
not in the realm of how well it's able to
correctly perform tasks.
And according to some research, on some
benchmarks, there hasn't been a lot of
improvement since
3.5.
So if that's the evidence we have about 4o,
what does it tell us about the future?
So this graphic, I guess this counts as
evidence, it counts as a fact, but it's
evidence of
the fact that they're going to be spending a
lot of money training GPT-5.
It doesn't actually tell us anything about how
much better or worse it's going to be
at what tasks it can do and how well it can do
them, despite what people on the internet
tell you.
We know that there's a trend that the models
have been getting exponentially faster and
cheaper for years, and that seems to have
continued through 4o, so it's likely to
continue for
some time, and that's good for lots of reasons.
But when it comes to tasks and correctness, we
know there's mixed evidence of it.
And so we don't really know how to extrapolate
from that.
So the best method of predicting the future is
about extrapolating from trends what other
trends are there that we can draw from.
It turns out there are two important ones, and
they're not actually discussed very much,
which is interesting.
The first thing we don't talk about much is a
bunch of experts who have been doing relevant
research for a very long time, but very few
people in the tech industry have paid any
attention to them at all because they're not
part of the tech industry.
These people are psychologists and ethicists
and philosophers who study not what neural
networks can do, but why humans are predisposed
to be particularly gullible about the sentience
of AI.
It turns out that the human brain seems to be
hardwired to believe that things that we
interact with have thoughts and feelings and
desires and such and such.
I'll put links down below.
There's a very long history of people attributing
thoughts and wants and desires to everything
from weather and nature to tools to pet rocks.
And not only do we naturally tend to believe
that non-human things have thoughts and
feelings,
but LLMs have been getting attributes that make
us even more likely to believe it.
There's a thing called the "Eliza Effect," which
has been known since the 70s, where
a chat interface causes "powerful delusional
thinking" in humans that has been likened to
a "slow-acting poison."
So there's a term that has been coined by
researchers that's called "dark patterns."
It describes user interfaces in products that
trick people or trick the brain into behaving
ways that the producer of the product wants
that is contrary to the interests of the
consumer
or the user.
Dark patterns are a huge topic, and researchers
are just starting to study the dark patterns
in LLM chatbots, paper below.
But two recent other papers on dark patterns
are particularly relevant to chat GPT4o.
One of them is about how synthetic voices
impact human decision-making, and the other
is about how cuteness is a dark pattern.
Does the fact that 4o has not only synthetic
voices, but cute affects like giggling?
Does that influence what people think of 4o?
That specific research hasn't been done yet,
but based on past findings, I expect it's
going to be very revealing when it happens.
But we know that, intentionally or not, LLM
chatbots have been designed, in a way known
by psychologists and ethicists, to trick humans
into believing that they're intelligent.
And that trend is getting worse, and shows no
indication of getting better.
One more trend that's relevant to all this, it's
been going on for at least eight years
with respect to AI specifically, and a whole
lot longer in general.
So for once, I'm only going to go back as far
as 2016 for brevity's sake.
I could go back farther, but it would be less
relevant.
So last year, we found out in a lawsuit
deposition that Tesla faked a self-driving demo
in 2016.
The Independent, which is an outlet in the UK,
recreated a Tesla demo in 2022 and found
that it actually crashed right into a cutout of
a child, as opposed to what the Tesla demo
did.
Okay, enough of Tesla to talk about Google for
a minute.
So in May of 2018, Google famously faked its
Duplex AI demo. Oh, and Google faked their
Gemini AI demo in December of 2023, so they didn't
learn much.
Recently Google claimed that their DeepMind
created 2.2 million new materials, but actual
researchers said quote: "We examine the claims
of this work, and unfortunately find scant
evidence for compounds that fulfill the trifecta
of novelty, credibility, and utility."
In other words, very few of the 2.2 million
compounds that Google claimed are of any
use or haven't already been discovered.
A Google VP recently released a thing that said
quote, "In addition to designing AI
overviews to optimize for accuracy, we tested
the feature extensively before launch.
This included robust red teaming efforts,
evaluations with samples of typical user
queries,
and tests on proportion of search traffic to
see how it performed."
Even yet, Google AI said that we should "eat at
least one small rock per day" and "add about
an eighth of a cup of glue to pizza sauce."
So we can't really trust Google.
Let's talk about Amazon.
So Amazon's AI checkout technology, the "Just
Walk Out" technology, turned out to be thousands
of remote workers in India instead of an actual
AI.
Ironically, that was done via Amazon's online
task platform that's called "Mechanical Turk,"
which is named after a famous fraud from the 1770s
where a chess playing robot just turned
out to be a man hiding in a box pulling its
strings.
Usually, I don't go back to the 1770s, but I
guess it happens sometimes.
And there have been a bunch of times when
something that was said to be AI just turned
out to be a bunch of remote people doing the
work.
So GM's cruise self-driving car technology used
1.5 remote humans for every vehicle on
the road to, quote, "remotely control the car
after receiving a cellular signal that
it was having problems."
Facebook had a serial competitor called "M" and
it was supposedly supervised by humans.
Reportedly, though, quote, "in practice, over
70% of requests were answered by human
operators."
It was shut down in 2018, supposedly it was
very expensive.
The SEC has recently started charging companies
with what they call "AI washing," which is
when companies say that they're doing things
with AI, when there's actually no AI involved.
And then there's things that do use AI, but
that companies insist on lying about how
well it does the AI.
So on January 23, 2024, Microsoft said, quote,
"Microsoft aims to provide transparency in
how its AI systems operate, allowing users and
stakeholders to understand the decision-making
processes and outcomes."
48 hours after that, Microsoft's products were
used to make viral, deep-fake, non-consensual
porn of Taylor Swift.
After the Taylor Swift deep-fake porn went
viral, it was reported, quote, "A Microsoft
AI engineering leader says he discovered
vulnerabilities in the OpenAI's DALL-E 3 image
generator in early December, allowing users to
bypass safety guardrails to create violent
and explicit images, and that the company impeded
his previous attempt to bring public
attention to the issue."
So much for transparency.
The Rabbit R1 demo showed a bunch of things
that reviewers said just don't work, and
it turned out to be just an Android app.
The humane AI pin gave false information in its
demo video, and the company quietly
re-edited the demo with new audio to make it
look like it gave the right answers.
The AI pin was famously called "the worst
product" a particular reviewer had ever
reviewed.
What about OpenAI, though?
Have they been caught lying about demos?
Quote, "Perhaps the most widely touted of GPT-4's
at-launch zero-shot capabilities has
been it reported 90th percentile performance on
the uniform bar exam."
Well, new reports say it was actually in the 15th
percentile of those that took the test
for the first time.
Turns out that what OpenAI seems to have done
was arrange it so that their AI got compared
to a bunch of people that had taken the test
before and failed the test before, and were
really likely to fail it again, and it got
better than 90% of people that were likely
to fail it again.
At least one of OpenAI's Sora demo videos was
done by an FX group called Shy Kids.
Link below for that.
There was a recent interview with an former
OpenAI board member who said that Sam Altman
had created, quote, "a toxic culture of lying."
Oh, and then moving on from OpenAI, there was
this thing called Devin. The company that
made Devin kind of lied about that.
There is a video about that, you might have
heard of it.
And there were other examples I didn't put in
this list because they were less directly
related to AI. And keep in mind that's just the
companies that have been both caught,
and that were high enough profile to actually
make headlines and get reported about.
I've been part of many software demos and many
product launches over the last 35 years,
and in my professional opinion, in my
experience, LOTS of demos lie.
There's no way to know for sure, but based on
my past experience, I'd guess that maybe
for every demo that gets exposed, at least a
couple of demos get away with it, maybe
more.
And yet we have people insisting that this is
all real, a WIRED.com article called "It's
Time to Believe the AI Hype" came out on the 17th
of May, just after the chat GPT-4o demo.
It tried out the same old joke about a friend
that takes another friend to a comedy club
to see a dog telling jokes, and the first
friend says, "What do you think?" and the
second friend says, "Well, the jokes bombed.
I'm not that impressed."
The article then says folks, "When dogs talk,
we're talking biblical disruption."
And maybe, but let me ask you this.
If you're confronted with a talking dog at a
comedy club, what's more likely?
That there's a dog that's actually talking, or
that, like Amazon Fresh, what's supposed
to be an unprecedented breakthrough is really
just a bunch of people behind the scenes trying
to trick you?
I know what I'd bet on.
And to top it off, the concluding paragraph of
that article insists, quote-This is a
direct quote- "But the demos aren't lying."
Except for, you know, TESLA self-driving and
GM's Cruise self-driving and Google Duplex and
Google Gemini, and Facebook's "M" Chatbot and
OpenAI's Sora and Chat GPT-4's Bar exam, and
Amazon Fresh's "Just Walk Out" and Rabbit
R1 and HumanAI's Pin. Oh, and Devin again.
Deep breath.
So, there's no clear evidence of accuracy and
task performance getting better.
There is clear evidence that the features being
added to these products are attributes
like voices that are known to my humans more
likely to believe that the products actually
think.
There is clear evidence that the companies have
been lying about AI's capabilities for
years, not all the companies, but many of them.
And there's clear evidence that journalists,
again, not all of them, but many of them,
have-and will continue to make-irresponsible
statements like "The demos aren't lying," all
evidence to the contrary.
But we have to make decisions about what we're
going to spend our time on.
And we have to decide what we're going to learn
and what we're going to avoid and whether
we want to switch majors or switch careers or
what.
What's going to happen with AI is one of the
bigger questions and software careers right
now.
In tech. Maybe even in the world, and I'd
really love to know what's going to happen.
And if I did, I would tell you, but I don't.
And honestly, nobody does.
What I know is that there is currently no clear
evidence that we're going to get human-level
artificial intelligence anytime soon.
And what we really know for definite sure is
that many of the people that are telling
us how great it's going to be, not only have a
financial incentive to lie about that, they've
been lying about it for years and have been
lying so obviously about it that they've been
caught lying about it over and over.
And that many people we should be able to trust,
like journalists who should know better,
keep saying things like "the demos aren't lying,"
even though many, many of the AI demos have
been lying and have been proven to be lying
over and over since at least 2016.
So we're on our own and we have to make up our
own minds.
And like generations of scientists going back
to at least the 16th century, I'm going
to choose to follow the evidence, and I'm going
to choose to reject the narrative that's
being promoted by the people who have "A Toxic
Culture of Lying."
But hey, you do you.
Never forget: the Internet is full of bugs, and
anyone who says differently probably thinks
you're so stupid that you would believe that
dogs can actually tell jokes.
Let's be careful out there.
Voir Plus de Vidéos Connexes
OpenAI’s new “deep-thinking” o1 model crushes coding benchmarks
Is AI A Bubble?
GPT Q* Strawberry Imminent, Sam Altman Trolls (Model Already Secretly Live??)
OpenAI's "Strawberry" Model Coming THIS MONTH...
Generative AI is not the panacea we’ve been promised | Eric Siegel for Big Think+
ASU professor talks use of AI, ChatGPT in universities and its future
5.0 / 5 (0 votes)