Intel has a Pretty Big Problem
Summary
TLDRThe video script discusses the instability issues with Intel's 13900 K and 14900 K processors, suggesting a deeper problem than just motherboard voltage and clock settings. The speaker investigates game telemetry and crash databases, revealing a significant number of decompression errors and IO issues specifically with these Intel chips. The analysis implies that the problem might not be fully resolved by a microcode update, and raises concerns about Intel's messaging to customers and system integrators regarding the CPU's reliability.
Takeaways
- 🤔 Intel's 13900 K and 14900 K processors are experiencing instability issues, with users reporting intermittent problems that are difficult to troubleshoot.
- 🕵️♂️ The speaker conducted an 'armchair diagnostic' using game telemetry data from two different game companies to investigate the nature of these crashes.
- 📊 A significant number of crashes were attributed to decompression errors, which are unusually high for Intel's 13th and 14th generation CPUs, suggesting a potential hardware issue.
- 📈 The crash rate seems to increase over time for the problematic CPUs, indicating a possible degradation or cumulative error effect.
- 💻 Data from game servers using Intel's 13900 K and 14900 K CPUs revealed similar stability issues, even on more conservative W680 chipset motherboards.
- 💾 The speaker found that IO errors were also disproportionately high for the affected Intel CPUs, suggesting a broader range of hardware issues.
- 🛠 BIOS updates and adjustments, including disabling ecores and reducing memory speed, were attempted as fixes but did not fully resolve the issues.
- 💡 The possibility of a power or voltage issue within the CPUs is suggested, as the problems persist even on motherboards designed for stability.
- 📉 Intel's messaging to customers and system integrators appears to be inconsistent, with some reports indicating a 10-25% failure rate for the CPUs.
- 🔄 The data center provider is charging a premium for support on Intel systems due to the high number of incidents requiring intervention.
- 🚫 The lack of clear communication from Intel and the ongoing uncertainty about the root cause of the issues are causing frustration among gamers and enthusiasts.
Q & A
What is the main issue discussed in the video script regarding Intel's 13900 K and 14900 K processors?
-The main issue discussed is the instability of Intel's 13900 K and 14900 K processors, which has been ongoing for months and is suspected to be more than just a simple motherboard voltage or clock problem.
What does the speaker believe might be the deeper issue with Intel's chips based on their investigation?
-The speaker suggests that there might be a deeper hardware issue with the chips themselves, as opposed to just a software or microcode issue, based on the high number of crashes and errors reported.
How does the speaker obtain data for their analysis?
-The speaker obtains data by accessing crash databases from two different game developers who provided them with information on system configurations, play times, and crash rates.
What is the significance of the 'out of VRAM' error mentioned in the script?
-The 'out of VRAM' error is significant because it is a common error reported with Intel CPUs that have the instability problem, even when the system is not actually out of VRAM, indicating a potential hardware issue.
What does the speaker find when analyzing decompression errors in game databases?
-The speaker finds that there are 1,584 decompression errors logged in the past 90 days, with 1,431 of those being related to Intel's 13th or 14th generation CPUs, which is a disproportionately high number compared to other CPUs.
Why does the speaker believe that the issue might not be fully resolved even with a BIOS update or disabling ecores?
-The speaker believes this because the error rates and instability issues persist even in more conservatively configured systems, such as those using the W680 chipset in data centers, which suggests a deeper hardware problem.
What is the implication of the speaker's findings for game developers and data center providers?
-The implication is that game developers and data center providers may experience higher support costs and system instability, leading some to consider alternative CPU options like AMD's 7950 X for new server deployments.
What is the speaker's view on Intel's messaging to customers regarding the CPU issues?
-The speaker criticizes Intel for not providing clear and concise messaging to customers, especially enthusiasts who are experiencing issues, and suggests that Intel should offer replacements for affected CPUs.
What steps did the speaker take to ensure the systems were configured correctly for testing?
-The speaker updated the BIOS to the latest versions as of June 25th, 2024, and tested various configurations, including different DDR5 speeds and multipliers, to find the most stable settings.
What unusual phenomenon did the speaker observe in some systems prior to a hard crash?
-The speaker observed that in some systems, the CPU would become unexplainably slow for up to a minute before a hard crash, with no clear correlation to thermal monitoring or power issues.
Outlines
🤔 Investigating Intel CPU Instability
The script discusses the widespread instability issues with Intel's 13900 K and 14900 K processors, suggesting a deeper problem beyond a simple motherboard voltage or clock speed issue. The author embarks on an 'armchair diagnostic' journey, using game telemetry data from two different game developers to analyze crash reports. The data reveals inconsistencies in crashes, indicating a potential hardware issue. The author also explores the possibility of decompression errors being related to the CPUs and notes the high frequency of these errors in game databases, particularly with Intel's 13th and 14th generation chips.
🔍 Diving Deeper into CPU Crash Data
Continuing the investigation, the script examines the distribution of Intel and AMD CPUs in the crash database, finding a significant preference for Intel among the crashing systems. It also touches on the misleading nature of telemetry data due to various factors like operating systems and GPU configurations. The author highlights the peculiarity of IO errors, which seem to be more common with the problematic Intel CPUs, and notes the underrepresentation of AMD CPUs in the error logs. The analysis also includes looking at systems with high GPU memory, suggesting that the observed errors are not due to insufficient VRAM but could be related to the CPUs themselves.
📈 Unraveling Data Center CPU Issues
The script shifts focus to data centers, where the same Intel CPUs are used in server environments. It reveals that these CPUs, even when used with more conservative motherboards like the W680 chipset, still exhibit stability issues. The author shares insights from a negotiation process for new servers, where Intel systems are more expensive due to higher support costs associated with unresolved issues. The data center providers have experienced high support incidents, leading to a premium for Intel systems, and have even resorted to replacing CPUs or updating BIOS to mitigate the problems.
🛠️ Exploring Possible Solutions and Intel's Response
The author explores potential solutions, including BIOS updates and memory speed adjustments, finding that while these measures help, they do not completely resolve the issues. There is speculation about Intel's communication with large system integrators, suggesting that the problem rate might be higher than officially stated. The script also addresses the lack of clear messaging from Intel to its customers and the potential impact on gamers and enthusiasts who have invested in these CPUs.
🚫 The Ongoing Challenge of CPU Stability
In conclusion, the script emphasizes the ongoing challenge of CPU stability with Intel's 13900 K and 14900 K processors. Despite various attempts to address the issue, the root cause remains elusive, and the situation is not improving. The author calls for clearer communication from Intel and a commitment to making affected customers whole. The script ends with the author signing off with a note of uncertainty and the need for further investigation into the problem.
Mindmap
Keywords
💡Intel 13900 K and 14900 K
💡Instability
💡Microcode Update
💡Telemetry
💡Decompression Failure
💡BIOS Changes
💡Data Center
💡W680 Chipset
💡ECores
💡DDR5 Memory
💡Game Telemetry
Highlights
Intel's 13900 K and 14900 K processors are experiencing instability issues, with concerns that the problem may be deeper than just a motherboard voltage or clock speed issue.
The speaker conducted an 'armchair diagnostic' to investigate the root cause of the crashes, suggesting it might not be fixable with a microcode update.
Access to game crash databases from two different game companies revealed a high number of decompression errors specifically with Intel's 13th and 14th generation CPUs.
The error rate for decompression failures was significantly higher for Intel 13900 K and 14900 K compared to other CPUs, with 1431 instances logged in the past 90 days.
Game telemetry data suggests that the instability is not isolated to gaming systems, as similar issues were found in data center servers using the same CPUs.
Data center motherboards, which are designed for maximum stability, still experienced high crash rates with the 13900 K and 14900 K CPUs.
The speaker found that disabling ecores and adjusting BIOS settings helped mitigate but did not fully resolve the stability issues.
There is a significant discrepancy in the error rates between Intel and AMD CPUs, with AMD showing far fewer instances of decompression failures.
The analysis of game server crash data indicates that the rate of errors may be increasing over time for the problematic Intel CPUs.
The speaker suggests that the issue might be related to power delivery or timing problems within the CPU, rather than just motherboard issues.
Intel CPUs deployed in data centers for gaming servers are reportedly more expensive and have higher support costs due to stability issues.
Some data center providers are steering customers towards AMD 7950 X systems as an alternative due to unresolved issues with Intel CPUs.
The speaker highlights the importance of using game telemetry and crash data for identifying and understanding hardware issues.
There is a call for Intel to provide clear messaging and support for customers affected by the CPU instability issues.
The ambiguity and lack of clear resolution from Intel are causing frustration among gamers and data center operators alike.
The video concludes with the speaker emphasizing the need for further investigation and the potential for deeper hardware issues with the Intel CPUs.
Transcripts
we all know that Intel's got a problem
with the 13900 K and the 14900 K being
unstable it's been months do we really
blame Gamers for being impatient at this
point I'm not so sure that this is
really just a boost motherboard voltage
clock problem I'm starting to think
there's something deeper wrong with the
chips I mean it's Thursday all I want to
do is just play Dwarf Fortress of
ridiculously high single thread clock
speeds I get intermittent problems are
the worst to troubleshoot but it
occurred to me we have another source of
crashes the games themselves
loads of games have Telemetry in them to
track their crashes a data source from
the crashes I mean so I started digging
into that and what I found will shock
you I'm much less sure that we can fix
this with a bioer micro code update
because I started digging into this and
I went on a journey and now now now I'm
troubled come with me as we attempt a
level one text armchair diagnostic it's
a level one diagnostic but from the
armchair right so take that with a bit
of a grain of salt
[Music]
I needed data from thousands of systems
how am I going to do that well did you
know when you play a game that most of
the time it logs usage data a lot of the
times one set of analytics goes to the
marketing team like how long you're
going to play the game when do you open
it what all is going on with it but
there's another set of data that goes to
the dev team around when the game
crashes the game develop velers will get
a crash report so I reached out to my
contact list and I found two different
people from two different games that
were willing to give me access to their
crash database so I could look around
for interesting stuff okay full
disclosure it took a little bit of
finagling and convincing because I said
hey I think this weird thing is going on
and they said n that's that seems weird
I was like no the errors are probably
not what you think they are and they
said hm you might be right see the
problem with the the way that the 14900
K and the 1300k and other CPUs in the
13th 14th generation are crashing is
that they're not super consistent I
needed to know system configuration Play
Time crash rate I need to know about the
population of just people playing the
game without errors in order to make
suppositions and these databases are
large and they're also rolling neither
company really hangs on to untagged or
untriaged events more than about 3
months and some there's some exceptions
to that for really outliers but mostly
it's a three-month time window the
instability is not particular as well
it's not like the ancient problem you
know the the Pentium foof bug this is a
Hardware problem which actually led to
Intel creating their whole micro Code
system so they could patch CPU erata and
software even when you had a hardware
problem uh in this case the errors are
all over the place you're even getting
GPU errors GPU errors yeah you know the
out of vram error that has now become
infamous a lot of armchair experts and
forums saying hey your game sucks you've
got a vram leak it's got to be you and
that was actually not the case it is
actually a really common error with
Intel CPUs that have this problem out of
vram error when you're not actually out
of vram and these crashes aren't so bad
that you know the game totally crashes
always though we know that users are
experiencing worse crashes things like
blue screens and that sort of thing but
we can't get a crash report if the
computer blue screens or at least most
of these games are not set up to be able
to do that so there is a little bit of
survivorship bias in all the stuff that
I'm talking about today so again grain
of salt there's also one operation that
stresses the CPU in a particular way
that is decompression it is a common
feature of game
and check out this statement from Ood
regarding decompression failures when
we're talking about their game Tools Red
V game tools yeah check this out this is
an Intel specific thing now this
decompression library is ubiquitous it's
well-used it's probably nearly bug-free
this is a hardware issue that's creating
this and you can see that from this
bulletin this article suggests some bios
changes and some clock changes that'll
help users mitigate and most of these
changes are around power and clock
settings okay fine maybe motherboard man
facturers pushed things too far okay
maybe we'll come back to that but I've
got some big databases here let's see if
I can find this particular error the
udle error how many decompression errors
are there logged in their respective
game databases for the past 90 days okay
that the answer to that is
1,584 how many of those are Intel 13th
or 14th generation decompression errors
1,431 what's the next most high CPU with
an error it's an i79750h with just 11 11
instances of decompression failure what
about AMD CPUs maybe AMD I only saw four
entries from any AMD CPU and you know
that's pretty awesome don't assume that
that means that AMD CPUs are better I
mean I think they are in this particular
case they're not experiencing these
kinds of issues but there could be way
less AMD CPUs in this population we
don't we don't know I decided to check
out those things and check out the
distribution and then there's handhelds
to worry about almost all handhelds are
AMD CPUs not all of them but a lot of
them okay so what's the breakdown cuz
you know if we're only working with five
players on AMD systems so like we we
need to know right okay the breakdown
between Intel CPUs and AMD CPUs in the
crash database was about 7030 in favor
of Intel which suggests something about
70% of players are using Intel 60% on
Nvidia and the rest were among AMD and
literally everything else this is also
another instance where it's weird and
misleading because you know this is game
Telemetry data from Windows but
sometimes this is actually game
Telemetry data from Linux
and this reporting tool reports uh gpus
grouped together in the Linux scenario
differently than it does on Windows and
so you end up with some percentage of
the 40% of users actually being Linux
users that could be using an Nvidia or
an AMD CPU so 60% really is the floor
for NVIDIA gpus in this scenario which
is weird I know it doesn't matter for
this video I didn't I I chose not to
clean up the data for GPU distribution I
just wondered something about the
population so we get it the data here is
not super clean not ideal and this is
also why I think this is flown under the
radar a little bit fortunately the rate
of Errors per unique player was also not
super high there was a small number of
people less than 200 that were suffering
tremendously don't get me wrong but like
the PC World article where they had
actually SWA their CPUs I think those
200 people actually would be better off
just swapping their CPUs just getting an
RMA no amount of micro code is going to
help those users I think for the problem
users I used those as a springboard to
look at what other errors their systems
had logged because they've logged a lot
of them and I saw a lot of IO errors or
so IO errors the game these are nvme
errors but the game doesn't log that the
Telemetry tool doesn't look at system
errors as a thing that we should log it
just says it can't retrieve a game asset
which is not super uncommon like an nvme
error is not really a super uncommon
thing and once again the population of
all systems that had any IO eror grouped
by CPU the percent of Intel 13900 K and
14900 K were definitely over represented
they had a much higher uh rate of errors
in all errors for those CPUs versus any
other CPU it's very odd the other reason
I say that this is odd is because when
we're talking about IO and pcie devices
those are a different clock domain on
the CPU like if every little overclock
instability would result in file system
corruption you'd be reinstalling Windows
a lot and yeah sure a severe overclock
can cause problems
but most of the time you're not going to
corrupt your SSD from a bad overclock if
a gamer experienced 10 or more errors in
the last 90 days you could bet at least
one of them was an IO error of some type
in the game where it couldn't retrieve
an asset at least in the filtered cohort
of users experiencing four or more
errors in in 90 days uh there were no IO
errors of this type from AMD systems at
all in the last 90 days there was you
know maybe there aren't enough AMD
system systems I mean I really really
dug through both game companies
databases and I only found four errors
that could possibly be attributed to an
IO error or an IO error in that sort of
a context that's not really enough data
in my opinion to to make any sweeping
conclusions but this IO error mostly
does seem confined to people that are
having real Earnest Hardware errors I
decided to window the data down to
people with at least 20 gigs of GPU
memory next because it's just not
possible for either one of these games
to need more than that much V rank
Intel has also accepted that this error
likely stems from instability specific
to the problems we're talking about with
13th and 14th generation Enthusiast CPUs
so you don't have to take my word for it
that this error is probably not actually
related to out of vram in these specific
games so selecting for everybody that
had a 390 490 7900 XT 7900 XTX which is
actually surprisingly difficult because
the Telemetry tool doesn't always record
the correct GPU when you have more than
one GPU and igpu plus everything else
the oldest error that I found was one
from uh more than six months ago which
was logged and the dev team had spent a
lot of time on so it wasn't cycled out
of the system but one user had
experienced uh about two crashes um or
about one crash for every two hours of
playtime that seems like a lot trying to
find a cohort of players that play
regularly and then grouping them by CPU
and crash rate showed that at least for
this game AMD players had fewer crashes
than 13900 K and 14900 K players per
unit of play time at least when we take
into account the crash rate if the crash
rate is consistent and also the 12900 K
was shockingly better The 12900 K was
about equivalent for AMD CPUs and about
equivalent to everything that was in a
13th or 14th gen CPU which is
interesting it's also interesting that
you can't just check 13th gen CPUs in
general because some 13th gen CPUs are
actually rebadged 12th gen CPUs so you
can't just say all 13th gen CPUs you
have to look at specific ones so for
this video I tried to stick to just
13900 K KS KF and 4900 K ksk KF so in
other words just because it says it's
Intel 13th gen on the box doesn't
actually mean it's Intel 13th gen I mean
that's true for the K series CPUs it is
what it says but OEM processors and
variants
like it just depends So based on this
data and a lot of data that I'm not
commenting on I would think around 20 to
30% of players with one of those two
CPUs have experienced at least one crash
that can be attributed to the CPU or the
motherboard during their lifetime of
play
I think that the more that these CPUs
are used the rate of error is increasing
over time at least I spent a lot of time
trying to find a Smoking Gun here that
would look like that and it seems like
there's some data that fits that
supposition but the analysis of that is
very hard because the data is not
organized in a way to really be
conducive to that kind of searching for
the players that play consistently the
number of errors that they are
encountering with our systems are
definitely increasing over time for
those two CPUs I can say that at least
and the intermittent problems they're
the worst to troubleshoot it's like
there's intermittent contact in the
input polarizers did somebody sticking
iron filing in the in the CPU vat is it
some sort of contaminant I don't know
it's a leathery burnt bacon enigma
wrapped in a
mystery uh maybe the problem here is
self-inflicted I mean that's certainly
some of the messaging from Intel users
are overclocking their system or
motherboard makers are overclocking the
motherboard that's got to be the source
of the problem right well let's use our
brain and think about counter examples
of that oh look on that Quest I did
actually find something interesting it
turns out the 13900 K and the 14900 K do
have a place in the data center they're
they're not just for gaming CPUs they're
also for gaming servers and guess what
in the data center most systems are
deployed with a motherboard that is
based around the
w680 chipset a w series chipset not a z-
series chipset z- series chipsets for
overclocking totally different
motherboard totally different Power
phase designs maybe Intel is just
rebadging a Z series and they've been
super lazy about it but I don't think so
cuz it's designed for zeeon I mean these
are these are alternative motherboards
for LGA 1700 that is uh different than
their desktop class counterparts and so
one might expect that the CPUs would
behave differently on these much more
conservative motherboards because these
motherboards are designed to operate
well within the specifications of the
CPU so I said about Gathering crash data
from thousands of systems already
deployed in the data center around the W
series chipsets and here's a screenshot
of one of the systems that had crashed
hard and rebooted notice a few things on
this screenshot one the insanely
conservative ddr5 memory speed this may
be a a result of automatic crashing and
the Asus bios backing off a memory speed
two Asus w680 chipset like I was saying
I I I've seen similar crash screens from
the super micro w680 boards which is the
other game provider they use super micro
one of them uses Asus and the Crash rate
is pretty similar between these two I
mean the reason these are uses because
of the insanely High single thread clock
speed and for Game servers it turns out
it's actually useful w680 was created to
go along with motherboards designed for
maximum stability neither Asus nor super
micro motherboards really support giving
tons of extra power to the CPU or doing
insane overclocking for things on a
desktop so I really don't think both
Asus and super micro have colored really
far outside the lines on this
motherboard and I really don't think
Super Micro or Asus have just lazily
copy paste the voltage settings from
their desktop motherboards to the server
class motherboard boards fully 50% of
the systems deployed for both companies
with either one of these processors to
within one percentage Point are
experiencing the same stability issues
even disabling ecores has not fully
resolved the issue for one of these
companies the error rate also seems to
be going up over time on the server side
as well oh and get this it gets better
one of the companies is negotiating for
another $100,000 of servers and this
time a line item popped up in the
proposal from the provider the Intel
systems are more than $1,000 more
expensive than their equivalent AMD
counterparts and they let me insert
myself in the negotiation process I was
able to get on the phone and talk to the
data center provider oh boy they dropped
a lot of interesting nuggets the way
that the sales work here is you buy a
system and it gets deployed in the data
center but if there's an issue it's a
support contract that the owner of the
system never actually really touches it
at least until it's time to be retired
so I asked okay why is the support cost
on these Intel systems so much higher
than it was for roughly the same systems
that were bought in 2023 I mean there's
a couple minor Hardware changes but
similar systems and they said and I
quote support incidents have been
unusually high for that configuration so
recently we've had to update the BIOS
disable ecores or do CPU swaps to get
the issues resolved and we're not sure
that the issues are fully resolved so we
are charging a support premium for those
systems right
now huh isn't that interesting $1,000
extra that wasn't there six months ago
so I asked is that is that normal have
you had a lot of s like what's going on
and they said we had really good luck
with the 12900 K based systems that we
had and we always had good luck with the
Xeon something just isn't right with the
13900 K and 14900 K we already replaced
a lot of customer 13 900k systems with
14 900k systems and the issues don't
seem to be fully resolved we've been
steering customers toward 7950 X systems
instead they're almost always faster
anyway neat we talking one of the game
developers about this they said I think
I'm going to lose about $100,000 in Lost
players from their multiplayer server
crashes yeah if you were a game player
i' you'd be frustrated too makes sense
this game is terrible it just crashes
all the time and I get there there's a
certain cohort of you watching this
video out there that are going to say oh
this is what you get for not going with
Zeon but this wasn't a problem with 12
or 11th or 10th or 9th generation Intel
Desktop CPUs for this use case
relatively High single core performance
and relatively low cost it makes sense
for Game servers even if you trade minor
stability issues for uh the cost and the
performance but the stability tradeoff
for 13th to 14th generation at least in
these scenarios at least for the last
six months to a year for them I guess
has been terrible or at least 6 months
or so I tried to prod them a little bit
on what the data center was experiencing
from you know messaging from Intel and
support and they really didn't seem like
they were getting much support Beyond
just here having a tray of extra CPUs
swap CPUs and hope for the best it
really doesn't add up especially when
you consider w680 and it's more
conservative power and clock targets
that got me to thinking what is Intel
telling large system integrators like
Dell HP and Lenovo so I reached out to
contacts that I have inside of those
companies which required a little bit
more Intrigue on my part and I'm not
really sure that I got good Intel from
those companies but the Intel that I did
get said that well you can expect
between 10 and 25% of CPUs have a
problem or are marginal in some way and
and we're not really sure what the root
issue is do they say clearly if it is a
motherboard problem or an Intel CPU
problem the messaging seems to be that
it's a little column A and A little
column B even for OEM systems which also
like w680 tend to be a little more
conservative in power and clock
performance configurations even when
you're buying a k not necessarily when
you're buying a nonk based on what I saw
from game Telemetry and game server
crash data I would say that 10 to 25% is
much less than I would have guessed I
would have guessed that about half of
these CPUs have some type of issue with
some clearly a lot worse than others is
that attributable to power on time or
how like how much they've been used or
some overclock attempt I don't know I
don't know in terms of specifics and
crashing the two populations of systems
were a little different the one provider
uses dual dim configurations and that
seemed to suffer a lot the single dim
configurations seem to work a little
better uh 2 48 gig dims versus uh 4 32
gig dims opt for 2 48 gig dims every
time the most stable configuration for
testing YC cruncher 24 hours at a time
on the Linux side was definitely
configuring a Max multiplier of 53 and
configuring the ddr5 speed to 4200 for
the 4 dim configuration 5200 was fine
for single dim but 4200 uh you know
that's it technically yes the w680 does
support XMP but it's not recommended
especially in a game server context so
in order to find failures we would use a
combination of uh decompression tests
for the fonics test Suite with
Automation and Y cruncher cuz y cruncher
is always pretty great and a lot of the
time the failures were just random the
core was random everything else is
random there were a handful of machines
that would have specific failures we had
one machine that when doing S&T testing
it would always fail at the S&T test
almost immediately no matter what it's
kind of wild but mostly the failures
were random and sometimes why cruncher
would pass but compress -7 zip and PTs
would fail was very interesting oh and
in case you're wondering one of the
first things that I did in setting up
both machines for both providers was to
fully update the BIOS to whatever was
current as of June 25th 2024 which were
quite a lot of BIOS updates and that did
help but did not fully resolve the
issues in the end ddr5 4200 and
disabling ecores were the most drastic
things that positively impacted
stability but mostly disabling ecores
didn't have as much impact as making the
memory Run Dog slow one last thing I'll
leave you with on the Linux servers
because the way that one of the
particular games works it logs a number
of Game World ticks per second and it
turned out that I could use this to work
backwards from Crash events sometimes
when a system crashes The Tick rate in
the world would drop to about 50% of
normal and this I couldn't attribute
this to EC cor problems I couldn't
attribute this to power core uh Power
problems see the power is actually
logged at the socket by smart power
strips in the data center which is very
useful I couldn't figure out anything to
correlate this to like thermal
monitoring because we log that just
there wasn't anything just every now and
then the CPU gets miserably
slow and for up to a minute before
actual hard crash and I don't have any
explanation for that or Theory as to why
that might be for about half the systems
that are working fine there's no such
slowdown happening that is unexplainable
if intel knows what the root cause is it
could be that half of the CPUs that have
the issue can be mitigated in software
maybe that's it so maybe that's you know
that leaves
25% half of the half that are going to
have to be physically replaced I don't
know but you know the fact that we're
still doing guess work months after the
fact is is not ideal I mean the
ambiguity here is the problem consider
that when motherboard makers were caught
juicing AMD CPUs to the point of
catastrophic
failure AMD was quick to make a
statement that anyone affected by this
issue would be made
whole I really don't know that Intel has
done the same in a clear and concise way
like you want to take care of your
enthusiasts first right I mean that's
sort of the first and biggest killer of
this situation it's the uncertainty
Intel should step up with clear
messaging that Gamers and enthusiasts
with these CPUs with an affected CPU
will be made whole with a replacement
CPU if that's what it comes to if
they're still experiencing issues after
that the updates are in and the clock
for getting you know updates in almost
out in my opinion customers that are
buying system and and CPUs by the
thousands um they don't seem to all have
the same message I mean they think the
problem rate is going to be like maybe
10% if you're a smaller provider larger
providers is it being told a larger
number from what I can tell w680
experiencing problems with a similar
error rate as desktop at least for the
gaming stuff is interesting because the
w680 power targets are so much more
conservative eventually I think that the
Enterprise customers and the corporate
customers that are involved they're
going to start comparing notes and
they're going to start leaking things to
randos on the internet that make videos
to try to get some relief or to try to
get you know more eyes on the problem
especially if some percentage of CPUs
are going to end up having to be swapped
and it's not able to be fixed in
software or micro code or anything else
I
merely I mean for this video really what
I want is to point out that we do in
fact have the data there's a lot of data
from Steam games and there's a lot of
data from from you know just games in
general game Telemetry crash data the
windows event logs
uh yeah it's a little prickly and Rough
Around the Edges but you know I mean
you're not going to be able to just do
select star from where CPU you know from
game problems where CPU is equal to
3900k and then just look at the data uh
some some of these crashes are because
people are overclocking their system but
the data center error rate is alarmingly
high from these CPUs and there's got to
be something to that I will not in the
least be surprised if there is a deeper
issue with the power voltage you know
timing problem and maybe the CPU have
degraded over time some of them to the
point that they can't be salvaged like
that would not surprise me in the least
at this point in time whereas you know a
month ago I might have said ah it's
probably just motherboards and micro
code I it's not I don't I don't believe
that that's the issue anymore and what I
found was interesting and I think based
on this video that uh smarter people are
going to do their homework and go
digging and try to find stuff and
they'll probably find some really
interesting stuff how what this is level
one this has just been some some
rambling on what I found doing analysis
of game failure databases and listening
to some customers of Intel complain
about things a lot of things oh and in
case you're wondering that the game
provider they're going with AMD 7950 X
systems for their new game
servers they're being provisioned right
now so that they have a little bit of
breathing room to to move over their
multiplayer servers so that they can
troubleshoot and update bioses and do
whatever else on the uh the Intel
systems I mean they've already been
doing the BIOS update square dance uh
for a couple of months now just get
tired of it I'm what this level one I'm
signing out you can find me in the level
one forums
[Music]
[Music]
浏览更多相关视频
Intel's Biggest Failure in Years: Confirmed Oxidation & Excessive Voltage
Intel's new Microcode patch is HERE! Impact Testing Performance...
НОВЫЕ ПРОЦЕССОРЫ ОТ ИНТЕЛ! / ОБЗОР И ТЕСТЫ INTEL CORE i5 14600K
Turning off "Intel Default Settings" with Microcode 0x129 DISABLES THE VID/VCORE LIMIT
Intel 13900K & 13600K Temperature Myths BUSTED
AMD Ryzen 9 9950X CPU Review & Benchmarks vs. 7950X, 9700X, 14900K, & More
5.0 / 5 (0 votes)