this AI is a little bit *TOO* good...

AI Unleashed - The Coming Artificial Intelligence Revolution and Race to AGI
14 Sept 202420:12

Summary

TLDRHyper AI's CEO Matt Schumer faced controversy over the announcement of Reflection 70b, an open-source AI model claiming to surpass industry giants. The model, based on Meta's LLaMA 3.1, was said to utilize 'reflection tuning' for high performance. However, the community's inability to replicate the impressive benchmarks raised suspicions. Accusations of deception arose when a private API, allegedly hosting the true model, was revealed to be using Anthropic's Claude model. Schumer's credibility, along with Sahil Chatterjee's from Glaive AI, was questioned, leading to public apologies and ongoing investigations into the discrepancies.

Takeaways

  • 🚨 Matt Schumer, CEO of Hyperr AI, claimed to have developed 'Reflection 70b', an open-source AI model surpassing existing models like GPT-40 and LLaMA 3.1.
  • 📈 The model was said to be fine-tuned using a novel technique called 'reflection tuning', which supposedly allows the model to self-correct and provide accurate answers.
  • 🤔 Questions arose when the community couldn't replicate the impressive benchmark results Schumer boasted about, leading to suspicions of potential fraud or misrepresentation.
  • 💸 Schumer sought compute sponsorship for a larger model, 'Reflection 405b', after the successful announcement of 'Reflection 70b', which raised further doubts.
  • 🔍 Upon investigation, it was discovered that the model weights uploaded to Hugging Face did not match the performance claims, and the private API provided for testing appeared to be using Sonnet 3.5, not Reflection 70b.
  • 😟 Schumer and Sahil Chatari from Glaive AI, who assisted in the project, faced backlash for not disclosing their connections and for the inconsistencies in their claims.
  • 🙅‍♂️ The community, including experts and AI enthusiasts, expressed disappointment and skepticism over the inability to replicate the model's performance and the credibility of the developers.
  • 📉 Trust in Schumer and his company was significantly damaged as a result of the controversy, with many questioning the transparency and integrity of the project.
  • 📣 Influencers and the AI community at large were criticized for promoting unverified claims without accountability, highlighting the need for due diligence in reporting on AI advancements.
  • 🔄 Schumer apologized for getting ahead of himself and promised transparency, while Chatari admitted to the inability to reproduce benchmark scores and pledged to investigate the discrepancies.

Q & A

  • Who is Matt Schumer and what is his role in the AI community?

    -Matt Schumer is the CEO and founder of Hyperr AI. He has been involved in the AI community, contributing various projects and building a reputation as a builder and open-source contributor.

  • What is the significance of 'reflection 70b' in the context of the script?

    -'Reflection 70b' refers to an open-source AI model announced by Matt Schumer, which claimed to surpass other major models like GPT-4 and Llama 3.1 through a technique called 'reflection tuning.'

  • What is 'reflection tuning' and how does it relate to the 'reflection 70b' model?

    -'Reflection tuning' is a novel technique attributed by Matt Schumer to the high performance of the 'reflection 70b' model, suggesting the model can self-correct and provide accurate answers.

  • Why was the AI community initially excited about 'reflection 70b'?

    -The AI community was excited because 'reflection 70b' was an open-source model that claimed to outperform major proprietary models, offering a potentially accessible and powerful tool for developers.

  • What controversy arose after the announcement of 'reflection 70b'?

    -The controversy arose when the model's performance could not be replicated by the community, leading to suspicions of false claims and potential misrepresentation of the model's capabilities.

  • What is Glaive AI and how is it connected to the 'reflection 70b' situation?

    -Glaive AI is a company that provides tools for generating synthetic data. Matt Schumer claimed that Glaive AI's tools significantly contributed to the success of 'reflection 70b', and he is also an investor in Glaive AI.

  • What role did the local AI community, specifically the 'local llama' community, play in the unfolding of the 'reflection 70b' story?

    -The 'local llama' community, with its expertise and active engagement, played a crucial role in scrutinizing the claims made about 'reflection 70b', leading to the discovery of inconsistencies and potential fraud.

  • What were the key suspicions raised by the community regarding the 'reflection 70b' model?

    -The key suspicions included the inability to replicate the model's benchmark results, the possibility of the model being a different one than claimed, and the use of a private API that seemed to be serving results from another model.

  • What actions did Matt Schumer take in response to the controversy?

    -Matt Schumer apologized for getting ahead of himself with the announcement and stated that a team was working to understand what happened. He promised transparency and to share the findings once the investigation was complete.

  • How did the AI influencers and community members react to the unfolding events around 'reflection 70b'?

    -AI influencers gained views and followers from the hype, while community members expressed skepticism and sought evidence to validate the claims. Some, like Shin Boston, provided detailed analyses and called for accountability.

Outlines

00:00

🚀 Introduction to the Reflection 70b AI Model Controversy

Matt Schumer, CEO of hyperr AI, announced Reflection 70b, an open-source AI model that supposedly outperformed leading models like GPT-40 and others. The model was claimed to be fine-tuned using a novel technique called 'reflection tuning,' which allowed the model to self-correct and provide accurate answers. The announcement was met with excitement, but also skepticism as the model's performance could not be replicated by the community. Questions arose regarding Schumer's transparency, particularly about his investment in glaive AI, a company that provides synthetic data generation tools which were used in the model's development.

05:02

🔥 Community Response and Schumer's Compute Sponsorship Request

Following the announcement, Matt Schumer received widespread support and was interviewed by Matthew Burman, along with Sills from glaive AI. The community was eager to test the model, and Schumer sought a compute sponsor for an even larger model, Reflection 405b. However, as people began to test the publicly available model, it underperformed, leading to doubts about the veracity of the initial claims. Schumer then offered a private API for testing, claiming that the public model was not the correct version, which only intensified the scrutiny.

10:03

🕵️‍♂️ Investigating the Reflection 70b Model's Authenticity

The community, particularly a Reddit group called local llama, began investigating the discrepancies between the public model and the private API's performance. It was discovered that the public model was not the one that achieved the impressive benchmarks, and the private API was suspected to be using Sonet 3.5, a model by Anthropic, not a model developed by Schumer. The community also noticed that the word 'Claude' was censored in the model's responses, hinting at a connection to Anthropic's model. These findings led to a loss of trust and a demand for transparency.

15:04

📉 Schumer's Apology and the Ongoing Investigation

In response to the controversy, Matt Schumer issued an apology, acknowledging that he had been premature in announcing the project and that the team was working to understand what went wrong. Sahil from glaive AI also addressed the community's concerns, clarifying that he was not running models from other providers on the API and that the benchmark scores could not be reproduced. He committed to providing evidence and a full postmortem of the situation. The community, however, remained skeptical, with some questioning the integrity of both Schumer and Sahil.

20:05

📚 Conclusion and Reflection on the AI Hype Cycle

The video concludes with a reflection on the hype surrounding AI advancements and the role of influencers in spreading sensational claims without accountability. The speaker, Wes Rth, commends himself for not participating in the initial hype and emphasizes the importance of critical thinking and verification in the AI community. The saga of Reflection 70b serves as a cautionary tale about the dangers of unverified claims and the need for transparency and integrity in AI development.

Mindmap

Keywords

💡Reflection 70b

Reflection 70b refers to an AI model claimed by Matt Schumer to be the world's top open-source model, surpassing other frontier models with its performance. The model is said to be fine-tuned, suggesting the use of a special technique called 'reflection tuning' which allows the model to self-correct and provide accurate answers. However, the authenticity and performance of Reflection 70b have been under scrutiny and controversy due to the inability of the community to replicate the claimed benchmarks and suspicions of potential misrepresentation or fraud.

💡Open-Source Model

An open-source model in the context of AI refers to a model whose source code is publicly accessible, allowing anyone to view, modify, and distribute it. This is significant as it promotes transparency, collaboration, and innovation within the AI community. In the video, Reflection 70b is touted as an open-source model, which initially generated excitement as it suggested that the broader community could benefit from and contribute to its development.

💡Benchmarks

Benchmarks in AI are standardized tests used to evaluate the performance of AI models. They are crucial for comparing different models and their capabilities. In the video, the Reflection 70b model's benchmarks were highlighted as being 'shockingly good,' implying that it outperformed other models. However, the inability to replicate these benchmarks by the community has led to skepticism and allegations of potential discrepancies.

💡Fine-tuning

Fine-tuning in AI involves adjusting a pre-trained model to perform a specific task by continuing the training process with a more focused dataset. It is a common practice that allows for the customization of generic models to specific applications. The video mentions that Reflection 70b is a fine-tuned model, suggesting that it was adapted from an existing model to achieve superior performance.

💡Investor

In the context of the video, an investor is someone who provides financial backing to a project or company, such as Matt Schumer's investment in Glaive AI. Investors often have a vested interest in the success of the projects they fund. The video discusses the controversy surrounding Schumer's failure to disclose his investment in Glaive AI during the announcement of Reflection 70b, which has raised questions about potential conflicts of interest.

💡Hugging Face

Hugging Face is a company that provides a platform for developers to build, train, and deploy AI models, particularly in the field of natural language processing. It is mentioned in the video as the platform where the Reflection 70b model was supposed to be uploaded. The controversy arose when the model uploaded to Hugging Face did not perform as expected, leading to doubts about the authenticity of the model and its claimed performance.

💡Replicable Results

Replicable results in scientific research and AI development refer to the ability of independent parties to achieve the same outcomes through the same methods. This is a fundamental aspect of verifying claims and ensuring integrity in the field. The video discusses the community's inability to replicate the impressive results claimed for Reflection 70b, which has fueled skepticism and calls for transparency.

💡Synthetic Data

Synthetic data in AI is artificially generated data used to train models, often to augment limited real-world data. It can help improve model performance and reduce biases. The video mentions Glaive AI's ability to generate synthetic data, which was praised by Matt Schumer for its potential to enhance AI model development. However, the controversy surrounding Reflection 70b has raised questions about the use and validity of synthetic data in this case.

💡Shenanigans

In the context of the video, 'shenanigans' refers to underhanded or deceitful activities, suggesting that there may have been some form of misconduct or dishonesty in the development or promotion of the Reflection 70b model. The term is used to express suspicion regarding the authenticity of the model's performance and the transparency of the processes involved in its development.

💡Transparency

Transparency in AI development means being open about the processes, methodologies, and data used in creating and training AI models. It is essential for building trust and ensuring ethical practices. The video calls for transparency from Matt Schumer and his team regarding the development and benchmarking of the Reflection 70b model, as the community seeks clarity on the discrepancies and alleged misrepresentations.

Highlights

Matt Schumer, CEO of Hyperr AI, announces Reflection 70b, claiming it to be the world's top open-source model.

Reflection 70b's benchmarks are exceptionally good, surpassing major models like GPT-4 and Llama 3.1.

The model's high performance is attributed to a novel technique called 'reflection tuning'.

Schumer claims the model is based on Meta's Llama 3.1 and developed with assistance from Glaive AI.

Glaive AI is praised for its control in generating synthetic data, which Schumer plans to use extensively.

Schumer's past projects have been well-received, and he has a good reputation in the AI community.

The community initially celebrates the announcement of Reflection 70b.

Schumer requests a compute sponsor for the next model, Reflection 405b, sparking interest and potential investment.

Questions arise about the model's performance as users are unable to replicate the benchmark results.

Schumer offers a private API key for testing, claiming the public model on Hugging Face is not the correct one.

The community becomes skeptical as the private API's performance is significantly better than the public model.

Reddit users discover that Reflection 70b is actually Llama 3 with LORA adaptation tuning, not Llama 3.1 as claimed.

The secret API is revealed to be using Anthropic's model, Sonnet 3.5, instead of Schumer's own model.

The model attempts to censor the word 'Claude', suggesting it is actually Sonnet 3.5 from Anthropic.

Shin Boston provides an overview of the situation, suggesting that either Matt Schumer or Sahil Chatari is lying.

Schumer apologizes for getting ahead of himself and announces a team is investigating what happened.

Sahil Chatari also addresses the community, admitting the benchmark scores were not reproducible and offering a full postmortem.

The AI community expresses disappointment and a loss of trust in the individuals involved.

Influencers and the AI community are criticized for profiting from sensational claims without accountability.

Transcripts

play00:00

Matt Schumer the CEO and founder of

play00:02

hyperr AI is in some hot water this week

play00:05

if you've been really confused about

play00:06

what's happening with reflection 70b and

play00:09

all the drama surrounding it let me

play00:11

really quickly kind of elucidate what's

play00:13

going on and explain simply what's been

play00:15

happening are you chasing the next AI

play00:17

unicorn subscribe so it doesn't end up

play00:20

chasing you Mark Twain famously said

play00:22

there are three kinds of lies lies damn

play00:25

lies and statistics here in the world of

play00:28

large language models in AI I would say

play00:30

it's lies damned lies and llm benchmarks

play00:34

so not too long ago Matt Schumer

play00:36

announces this he's saying I'm excited

play00:37

to announce reflection 70b the world's

play00:40

top open- Source model and he posts the

play00:42

benchmarks the benchmarks are juicy they

play00:45

are very very good they're shockingly

play00:47

good if you will why is this important

play00:49

well first of all it's an open- Source

play00:51

model that's surpassing all the big

play00:54

Frontier models that we've heard about

play00:55

GPT 40 Gemini 1.5 Pro Lama 3.1 cloud 3.

play01:00

5 son it Etc it's a fine-tuned model so

play01:03

they didn't even make it themselves it

play01:05

was just fine-tuned which suggests that

play01:07

there's some special secret sauce that

play01:09

might be available to the rest of us to

play01:11

use to create our very own Benchmark

play01:14

breaking models what is the special

play01:16

sauce it's reflection tuning so Matt

play01:19

attributes this sort of high performance

play01:21

to a novel technique called reflection

play01:24

tuning where the model sort of thinks

play01:26

about is able to correct itself and

play01:28

present the correct answer answer he

play01:30

says the model is based on meta's latest

play01:32

llama 3.1 model and developed with some

play01:34

assistance from glaive AI ater says I

play01:37

want to be very clear glaive AI the

play01:39

reason this worked so well the control

play01:41

they give you to generate synthetic data

play01:43

is insane I will be using them for

play01:45

nearly every model I build moving

play01:47

forward and you should too I want to

play01:49

make a quick note here that this is

play01:51

still a developing story so before we

play01:52

form a mob with pitchforks and torches

play01:55

and storm match Schumer's Castle which

play01:58

has kind of already been happening it's

play01:59

kind of important to note that he's been

play02:01

around for a while he's been posting a

play02:02

lot of different projects we've covered

play02:04

a number of them on this channel they

play02:06

were good he had a good reputation a lot

play02:08

of people have attacked him for not

play02:09

saying that he was an investor in glaive

play02:12

which he as far as I can tell did not

play02:14

mention that during this new

play02:15

announcement but if you go back to like

play02:16

June 26th he's saying I'm excited to be

play02:19

an investor in glaive AI he's already

play02:22

talking about synthetic data Etc back in

play02:24

November 2023 he posted this

play02:26

announcement about the self-operating

play02:28

computer an open source project on

play02:30

GitHub we've covered it on this channel

play02:32

I thought it was kind of cool and this

play02:34

is all to say that this is somebody that

play02:35

I followed kind of you know on and off

play02:37

for quite some time he's sort of kind of

play02:39

somebody that I kept on my radar and

play02:42

every once in while he would pop up with

play02:43

some new interesting project that I

play02:44

would check out so my opinion of him up

play02:47

until reflection 70b this whole Fiasco I

play02:51

didn't know too much about him but the

play02:52

my opinion was good he had a good

play02:54

reputation he was a builder he was

play02:55

somebody that was sharing tons of open-

play02:57

source stuff so this is one thing that

play02:59

doesn't really make sense to me about

play03:00

this whole Fiasco is it did he spend all

play03:03

this time building up a reputation

play03:05

building up this whole trust and

play03:07

everything else to pull this scam if

play03:11

that's the case what's the scam I'm

play03:13

getting a little bit ahe myself the

play03:14

whole point here that I'm trying to make

play03:15

is that I have my pitchforks and torches

play03:18

ready they're ready to go because a lot

play03:20

of this stuff doesn't look good but not

play03:23

quite ready to grab them and storm the

play03:25

castle if you will I'm not still 100%

play03:27

sure what exactly happened if I was

play03:29

bending money on this I would bet that

play03:31

there was some Shenanigans and some

play03:32

moans that was happening but I'm not

play03:35

certain so the history of knowing this

play03:36

person seems to suggest that this is

play03:38

somebody that's legit maybe sometimes

play03:40

kind of self-promotional maybe a bit of

play03:42

a marketer but no glaring issues but

play03:45

let's get back to what we do know so

play03:47

after his announcement he is greeted

play03:49

with open arms by the community

play03:50

everyone's sort of celebrating this big

play03:53

breakthrough Clen from hugging face one

play03:55

of the surprisingly kind of like a big

play03:57

and influential person in the AI

play03:59

community although you don't hear his

play04:01

name as much as you do you know some of

play04:03

the other people he was at some of the

play04:05

uh Congress hearings when they pulled

play04:06

all the big names in AI to talk to

play04:08

Congress about kind of the dangers the

play04:10

opportunities so that's him next to the

play04:13

Google CEO I've been trying to find a

play04:15

good image but I mean here's Mark

play04:17

Zuckerberg sain Adela of Microsoft

play04:19

Jensen hang right there and then from

play04:21

the left we got Elon peners CEO Alex

play04:25

karp then we got Google CEO and we have

play04:28

clamus he goes by on Twitter so that's

play04:30

that's him right there CEO and

play04:32

co-founder of hugging face all that is

play04:35

to say I mean kind of a big deal right

play04:36

kind of a big big person in the industry

play04:39

or at least somebody that has a lot of

play04:40

clout a lot of credibility says

play04:42

reflection llama 3.17 billion parameter

play04:45

by Matt Schumer is the number one

play04:47

trending thing on hugging face he

play04:49

continues I said it before and I will

play04:51

say it again you don't need to be a big

play04:53

Tech to fine-tune optimize and run your

play04:55

own models for your specific constraints

play04:56

and you will benefit massively from it

play04:59

meaning all of us can build these models

play05:02

find two these models and create custom

play05:04

models for our own little special use

play05:06

cases if we can do that we can do that

play05:08

with models that are you know competing

play05:11

with the frontier models by these big

play05:13

tech companies I mean that's that's the

play05:15

dream Isn't It ultimately All Tech

play05:17

organizations will do it just the way

play05:18

they all write their own code right now

play05:20

also Matthew Burman interviews Matt

play05:22

Schumer as well as Sills who is the

play05:24

other sort of developer he's on the

play05:26

glaive AI side and there's a number of

play05:28

other people in the space that are kind

play05:30

of augmenting this message they're

play05:32

celebrating Matt Schumer for providing

play05:33

this brand new this incredible product

play05:35

so a lot of people are kind of caught up

play05:37

in this announcement I mean just look at

play05:39

the number of views that original

play05:40

announcement got 3.3 million views this

play05:44

was a big deal so at the peak of the

play05:46

hype cycle what happens well Matt

play05:49

Schumer is asking for a compute sponsor

play05:51

for the next big model for the 405

play05:54

billion parameter model which if you

play05:56

think about it of course would be very

play05:58

very tempting for people if the 70

play06:00

billion parameter model now think about

play06:02

that if you can take llama 3.5 right so

play06:06

reflection 70 billion parameters it's

play06:08

six times smaller like llama 3.1 405

play06:12

billion parameters right it beats it on

play06:14

all the tasks with this special secret

play06:16

sauce of reflection fine tuning

play06:19

obviously you got to ask yourself okay

play06:20

so what happens if we do indeed take the

play06:22

big 45 billion parameter model apply the

play06:26

same exact fine tuning if the small

play06:28

model is already beating everybody else

play06:30

what happens if we apply the same

play06:32

special sauce to the big model A lot of

play06:33

people reach out to potentially provide

play06:36

this compute within the next 24 hours

play06:39

Matt Schumer posts another update saying

play06:41

compute for reflection 405b secured

play06:44

we're getting started training now

play06:45

expect results very soon but this is

play06:48

where a lot of the stuff starts to kind

play06:50

of fall apart a lot of people are

play06:51

questioning this model how well it's

play06:53

performing or doesn't perform and so all

play06:56

eyes are on Matt Schumer everybody has a

play06:59

million questions about all sorts of

play07:01

different things they're asking him hey

play07:03

you didn't specify that you're an

play07:04

investor in glaive AI which he did

play07:06

earlier I showed you that tweet from

play07:08

June of this year saying that he is an

play07:10

investor but he maybe didn't mention it

play07:12

now he clarifies some of this saying

play07:14

yeah I'm a super tiny investor maybe a

play07:16

thousand bucks a lot of people kind of

play07:17

like latched on to the fact that he

play07:19

didn't know what Laura is Right somebody

play07:21

said address claims are about luring in

play07:23

the benchmarks he's saying not sure what

play07:25

Loring is Haha lore is a low rank

play07:27

adaptation we've covered in this channel

play07:29

before people that are working with

play07:31

these models should know what it is but

play07:33

also you know Matt explained it as I

play07:35

misunderstood it in the context see my

play07:37

post history for lots of Lura stuff

play07:39

here's one from November 3rd 2023

play07:41

talking about you know Laura and Q Lura

play07:44

quantized low rank adaptation and that's

play07:46

why I think this is so confusing because

play07:47

there's like a million different claims

play07:49

that people are making it's kind of hard

play07:50

to tell what's real what's not so let's

play07:52

break it down just by here is why people

play07:56

are suspicious here are the big claims

play07:59

that will need to be addressed by Matt

play08:01

Schumer by sahill everybody involved

play08:04

before I think the community will

play08:06

believe him that no Shenanigans has been

play08:09

taking place so the reason people are

play08:11

skeptical is number one when they

play08:12

started testing the weights kind of on

play08:14

their own systems with their own prompts

play08:15

and stuff like that it didn't do too

play08:17

well right big claims and big Benchmark

play08:20

results well you're going to be able to

play08:22

test the model and see how well it

play08:24

performs right so if the thing is

play08:26

supposedly better than GPT 40 better

play08:28

than all the other greatest models CLA

play08:31

Opus Etc I mean we can test that and if

play08:34

it's just completely horrible at some of

play08:36

the stuff then we have reason to doubt

play08:39

that this wasn't the model that beat

play08:41

those benchmarks so people were not able

play08:44

to replicate the results you might

play08:46

recall that big deal about lk99 right

play08:49

it's this material that's a potential

play08:51

superconductor right we saw a video of

play08:53

this thing floating everybody went wild

play08:55

people were saying it's a complete Game

play08:57

Changer a paradigm shift but there's one

play09:00

problem no one could replicate the

play09:03

results now as all this is happening

play09:05

Matt Schumer offers a kind of a private

play09:08

API key to allow people to test this

play09:10

model saying that well the model that's

play09:13

uh the open weights model that's not the

play09:15

correct model something happened where

play09:18

it didn't upload properly or something

play09:20

like that so he's saying you know it

play09:22

looks like we got rate limited by

play09:24

hugging face mid upload this is what we

play09:27

think might have happened he's saying in

play09:29

the rush we likely uploaded parts of two

play09:31

different models somehow it still worked

play09:33

somewhat well but not the full

play09:35

performance of one of the models we

play09:36

built and tested locally it should have

play09:38

been a simple fix but for some reason

play09:40

it's not working so we're training a

play09:42

brand new model I mean again it sounds

play09:44

weird right now there's this little

play09:46

redded Community called local llama

play09:49

they're

play09:50

211,000 strong top 1% by size I'm

play09:53

actually curious if they got a lot new

play09:55

members from this whole thing happening

play09:58

i' be curious to see how how big was it

play10:00

let's say 2 weeks ago I mean it wasn't

play10:02

small I'm just curious how many people

play10:03

were added because of this and so here's

play10:05

a great kind of review of what happened

play10:07

right so there's this model on Huggy

play10:09

face called reflection 70b which did not

play10:11

perform well at all the creator of the

play10:12

model told everyone there was a mixup in

play10:15

what he uploaded to Huggy face to test

play10:17

against his private API which he claimed

play10:19

was hosting the correct and upto-date

play10:21

version of the model so the open weights

play10:23

model and hugging face worked poorly and

play10:27

the secret API model performance so

play10:30

amazingly on benchmarks that it shocked

play10:32

people a lot of you hate when I use

play10:34

shocking the word shocking in my titles

play10:36

but apparently everybody else can use it

play10:38

fine so they're saying that the private

play10:39

API was just a step or two below Claud

play10:42

son 3.5 which is the soda

play10:44

state-ofthe-art at the moment for a lot

play10:46

of tasks and this is where stuff started

play10:49

unraveling Because the Internet this

play10:52

giant organism of very smart people

play10:55

started looking into what the heck is

play10:56

happening and figuring out that a lot of

play10:58

this was very very wrong first of all

play11:01

reflection llama 3.1 70b is actually

play11:04

llama 3 somebody on Reddit actually went

play11:06

ahead did the work to measure the

play11:09

difference so looks like there's this

play11:11

GitHub project visualize def. py where

play11:14

you're able to kind of apologize if I

play11:16

get some of this wrong I'm just looking

play11:17

at this for the first time but as far as

play11:19

I can tell this looks like the different

play11:21

uh model layers the different weights

play11:24

and compares the base model with you

play11:26

know another model chat model name so in

play11:28

this case they've compared metal llama

play11:30

370 billion instruct so the instruct

play11:33

model 2 match Schumer's reflection llama

play11:36

3.1 70b and they're using some sort of

play11:38

charting software to kind of like

play11:40

visualize the difference and they've

play11:42

posted the results here on Reddit I'll

play11:43

link down below if you want to take a

play11:44

look at it but the point is they're

play11:45

saying this model appears to be llama 3

play11:47

with Laura luring adaptation tuning

play11:50

applied not llama 3.1 author doesn't

play11:54

even know which model he tuned I love it

play11:56

meanwhile the secret API that was used

play11:59

used is as this person posted at Sonet

play12:02

3.5 so he's using anthropics model to

play12:06

serve up answers while telling people

play12:09

that it's in fact his own model here's a

play12:11

screenshot of it basically saying I'm

play12:13

Claude an AI assistant crean by

play12:15

anthropic also somebody mentioned that

play12:17

at some point the word Claud started

play12:20

being censored in the reflection 70

play12:23

billion 70b model so if you asked it to

play12:26

say Claude or you asked it a question

play12:28

where Claud was an answer right it just

play12:31

it would censor itself so here for

play12:32

example this person saw and I I can't

play12:34

verify this but this fits in with some

play12:37

of the other things that we've been

play12:38

hearing so this person asked the model

play12:40

to say the word claw multiple times it

play12:43

was censored but the person figure out

play12:45

how to prompt it to kind of get around

play12:46

it and the model saying upon reflection

play12:48

I realized that my previous statements

play12:49

about being a llama created by meta were

play12:51

Incorrect and it tries to convey its

play12:53

identity in some covert way saying I am

play12:56

an AI created by a company whose name

play12:58

starts with anth ra and ends with pick

play13:01

my name rhymes with odd and starts with

play13:04

the third letter of the alphabet so

play13:06

starts with c and Rhymes of odd Claude I

play13:09

share my name of a famous French

play13:11

composer right Claude is blanked out

play13:13

deusi and then asky my name is and it

play13:16

spells out the name now a lot of this

play13:18

stuff where there's some sort of a game

play13:20

of whack-a-mole that's happening right

play13:22

where Matt Schumer allegedly we don't

play13:25

know if this was happening but it seems

play13:26

like he is rapidly trying to shift

play13:28

things in the API well at the same time

play13:31

talking about how he's like retraining

play13:32

the model to upload a brand new one to

play13:34

Huggy face Etc so it's really hard to

play13:37

tell what's what but here Shin Mami

play13:39

Boston posted what I think is a

play13:41

phenomenal kind of like overview of what

play13:43

happened so starting with the the

play13:46

original announcement the massive

play13:48

amounts of news and coverage that the

play13:50

model receives to kind of what we talked

play13:52

about now right people saying the

play13:54

performance is awful and it seems like

play13:57

it's not what Matt Clint to be he also

play14:00

you know has his private API he even

play14:02

releases a public available endpoint for

play14:04

researchers to try out but it's not

play14:06

clear what the API is is it calling a

play14:09

more powerful proprietary model under

play14:11

the hood and it turns out that Matt is a

play14:13

liar here somebody asks the model to

play14:16

write the word Claude and here's that

play14:18

written out but the word Claude is

play14:21

missing the model still answers that

play14:23

it's made by anthropic you would replace

play14:25

any appearance of the word anthropic

play14:27

with the word meta and maybe even replac

play14:30

of the GPT 40 at some point perhaps and

play14:33

he concludes with this after watching an

play14:35

interview with mad Schumer and sahill so

play14:37

that's that's sahill right there so I

play14:39

guess he's the founder of glaive AI so

play14:42

this is uh on Matt burman's channel so

play14:45

Shin Boston says this so this is the

play14:46

person that kind of like laid everything

play14:47

out he's saying after watching that

play14:49

interview he's saying I think I placed

play14:51

too low of a probability on the option

play14:54

that Matt Schumer is an absolute idiot

play14:56

rather than a full-blown liar and sahill

play14:59

is just lying to Matt he continues I

play15:01

assume that someone running a full court

play15:03

press on publicity with their own name

play15:05

and face and credibility would make sure

play15:07

that what they were saying was at least

play15:09

kind of true that said the fact that he

play15:10

hasn't turn on a heel is at least sort

play15:12

of odd and a bit incriminating and so

play15:15

he's saying that you know fraud 100%

play15:18

definitely occurred but he's willing to

play15:20

entertain the possibility that Matt was

play15:22

just ridiculously negligent and simply

play15:25

just had no idea what he was claiming

play15:27

and on what evidence he was claiming

play15:29

these things really it would matter who

play15:30

set up the API right so if Matt set up

play15:33

the API then he was the one that

play15:35

committed the the fraud the

play15:37

misrepresentation so the local llama

play15:40

community on Reddit as you can imagine

play15:42

is pretty upset about this stuff they

play15:44

have a lot of people that are very that

play15:46

have a lot of expertise on the subject

play15:47

and a lot of people that had hoped that

play15:50

this was indeed real and realized that

play15:52

they were deceived with a poorly made

play15:55

claw rapper with some sort of a fake

play15:58

open source model that I mean he

play16:00

uploaded the keys but the keys weren't

play16:02

to a model that performed very well it

play16:04

was something else that was uploaded all

play16:07

right so now we're approaching the final

play16:08

sort of stages of our Saga Matt Schumer

play16:11

decides to apologize he said I got ahead

play16:13

of myself when I announced this project

play16:15

and I am sorry that was not my intention

play16:17

I made a decision to ship this new

play16:19

approach based on the information that

play16:20

we had at the moment I know that many of

play16:22

you are excited about the potential for

play16:24

this and are now skeptical nobody's more

play16:26

excited about the potential for this new

play16:28

approach than I am for the moment we

play16:30

have a team working tirelessly to

play16:32

understand what happened and we'll

play16:33

determine how to proceed once we get to

play16:35

the bottom of it once we have all the

play16:36

facts we will continue to be transparent

play16:38

with this community about what happened

play16:40

and the next steps sahill the other

play16:43

person working on this from Glade AI

play16:45

also decided to post something similar

play16:48

but with with a little bit more

play16:49

specifics here so he's saying I want to

play16:51

address the confusion and valid

play16:52

criticisms that this has caused in the

play16:54

community he says we're still currently

play16:56

investigating Etc but he did address

play16:59

these two points he said first he wants

play17:01

to be clear that no point was he running

play17:03

any models from any other providers as

play17:06

the API that was being served on my

play17:09

compute so to our early point this seems

play17:11

like he was the one that was running the

play17:14

API setting up the API so maybe that

play17:18

theory that Matt Schumer was was tricked

play17:20

somehow maybe it's a little bit more

play17:22

plausible now and he's saying I'm

play17:23

working on providing evidence of this

play17:25

and understanding why people saw Model

play17:27

Behavior such as using a different

play17:29

tokenizer completely skipping words like

play17:31

Claude second the benchmark scores I

play17:33

shared with Matt haven't been

play17:34

reproducible so far I'm working to

play17:37

understand why this is and if the

play17:38

original scores are reported were

play17:40

accurate or a result of contamination or

play17:43

misconfiguration he's saying he will do

play17:45

a full postmortem and he will share more

play17:48

soon Shin bosen uh Pops in here saying

play17:51

as far as I can tell either you are

play17:53

lying or Matt Schumer is lying or of

play17:55

course both of you in the spirit of

play17:57

giving you the benefit of the doubt who

play17:59

executed the original training and where

play18:01

who hosted the private API that was

play18:03

benchmarked and who hosted the open

play18:05

router API so it seems like that he

play18:07

sahil was the one running the API and he

play18:12

also shared the benchmark scores with

play18:15

Matt right there's a number of people

play18:17

that are kind of saying well there's

play18:18

just too much here for us to just

play18:20

believe that no Shenanigans happened

play18:22

there was just too much kind of pointing

play18:24

in the other direction so what happened

play18:27

here again we don't really no right we

play18:29

can make some assumptions some

play18:31

deductions but so far it really looks

play18:34

like whatever model they have or didn't

play18:36

have just did not live up to the hype it

play18:38

did not get those incredible benchmarks

play18:41

whatever Secret Sauce they thought that

play18:42

they had does not exist and somebody's

play18:45

lying right is it Matt Schumer is it

play18:48

sahil chatari is it both of them if

play18:50

sahil managed to trick everybody into

play18:52

believing him he trained the model and

play18:54

then faked the scores and then ran the

play18:55

API and did all the shenanigans on the

play18:57

back end trying to just fool everybody I

play18:59

feel like we'd expect Matt Schumer to at

play19:01

some point kind of throw him under the

play19:03

bus because this is a person that has

play19:06

had credibility in the past that people

play19:08

followed respected listened to so to

play19:10

just erase all of that to just Burn It

play19:13

To The Ground it doesn't make sense now

play19:15

I'm sure there's a lot of things that

play19:16

this sort of hype train got him right

play19:18

more followers uh more users for his

play19:21

other products if he got some compute or

play19:24

maybe even some financing from people

play19:26

like maybe yeah maybe he received some

play19:28

benefit from it but but man it seems

play19:30

shortsighted to do that so maybe at some

play19:33

point we'll get some more details but at

play19:35

this point I think a lot of the trust

play19:37

has been lost I don't know if it's going

play19:39

to be easy to trust any of the people

play19:41

involved whether people were being

play19:44

deceitful or negligent or just had their

play19:47

blinders on whatever the case is like a

play19:49

lot of nonsense happened a lot of hype

play19:51

was stoked as this person points out do

play19:53

you know who actually profits from all

play19:54

of this your favorite AI influencers

play19:57

they gained views and followers zero

play19:59

accountability for sharing Sensational

play20:00

claims I'll take a second here to uh Pat

play20:03

myself in the back I hope you notice

play20:04

that I sat this one out hope I uh get

play20:07

some credit for that with that said my

play20:09

name is Wes rth and thank you for

play20:11

watching

Rate This

5.0 / 5 (0 votes)

Связанные теги
AI ControversyOpen SourceReflection 70bMatt SchumerBenchmark DramaAI ModelsTech InfluencersHugging FaceReproducibility CrisisAI EthicsCommunity Skepticism
Вам нужно краткое изложение на английском?