M4 Deep Dive - Apple's First Armv9 Processor
Summary
TLDRApple's new M4 processor, introduced in the iPad Pro, boasts up to 10 cores with enhanced performance and efficiency cores, featuring improved branch prediction and deeper execution engines. The M4 integrates Next-Generation ML accelerators within its CPU, supporting ARMv9.4 architecture, marking Apple's first venture into this territory. Geekbench scores indicate a significant leap in performance for the M4, especially when compared to its predecessors. The device also includes a 10-core GPU with dynamic caching and hardware-accelerated ray tracing. Speculations on future M4 variants and its potential integration into iPhones are also discussed.
Takeaways
- 🍏 Apple has introduced the new M4 processor, which is initially being used in an iPad Pro rather than a MacBook or MacBook Air.
- 🔍 The M4 processor features up to 10 cores, with configurations of up to four performance cores and six efficiency cores, an upgrade from the M3's 4+4 setup.
- 🚀 Enhanced features of the M4 include improved branch prediction, wider decode and execution for performance cores, and a deeper execution engine for efficiency cores.
- 🌟 A significant highlight is the inclusion of Next Generation ML accelerators within the CPU, which is crucial for tasks involving machine learning, computer vision, and scientific simulations.
- ⚠️ Consumers should be aware that not all M4 processors come with 10 cores; some may have nine, and this isn't always clear in Apple's marketing materials.
- 💾 The 10-core version of the M4 is only available with 1 terabyte of storage and 16 gigabytes of RAM, which may be a limitation for some users.
- 🛠️ The M4's microarchitecture has been improved with features like scalable Matrix extensions, which are part of the ARM v9.4 architecture and not available in ARM v8.
- 📊 Geekbench scores indicate that the M4, especially the 10-core version, shows a significant leap in performance over previous generations like the M1, M2, and M3.
- ⏱️ The M4 operates at a boosted clock speed of 4.4 GHz, which contributes to its improved performance compared to its predecessors.
- 📈 When comparing the M4 to competitors like the Snapdragon X+ and X Elite, the M4 shows promising performance, but actual power efficiency and real-world testing will provide a clearer picture.
- 🎮 The M4 also includes a new 10-core GPU, which builds on the M3's graphics architecture and includes features like dynamic caching and hardware-accelerated ray tracing.
Q & A
What is the new M4 processor from Apple used in?
-The new M4 processor from Apple is used in an iPad Pro, rather than a MacBook, MacBook Air, or similar devices.
What is the core configuration of the M4 processor?
-The M4 processor has up to 10 cores, consisting of up to four performance cores and six efficiency cores, which is an upgrade from the M3's 4+4 core configuration.
What is the significance of the enhanced Next Generation ML accelerators in the M4 processor?
-The enhanced Next Generation ML accelerators in the M4 processor are crucial as they facilitate hardware-accelerated matrix operations, beneficial for scientific simulations, computer vision, and machine learning.
How does the M4 processor's core configuration affect its availability in different iPad Pro models?
-The M4 processor with a 10-core configuration is only available in the iPad Pro model with one terabyte of storage and 16 gigabytes of RAM. The nine-core version is available in models with less storage and RAM.
What is the clock speed of both the nine-core and ten-core versions of the M4 processor?
-Both the nine-core and ten-core versions of the M4 processor have a boosted clock speed of 4.4 GHz.
What is the difference in microarchitecture between the M1, M2, M3, and M4 processors?
-Each processor (M1, M2, M3, M4) has a different internal microarchitecture. The M4 notably includes improved branch prediction, wider decode and execution for performance cores, and a deeper execution engine for efficiency cores, as well as NextGen ML accelerators.
What does the M4 processor's Geekbench scores indicate about its performance compared to its predecessors?
-The M4 processor's Geekbench scores show a significant leap in performance, especially when comparing single-threaded scores normalized by clock speed, indicating improved microarchitecture beyond just increased clock speeds.
How does the M4 processor compare to the Snapdragon X+ and Snapdragon X Elite in terms of performance?
-While the Snapdragon X+ and X Elite have not yet been tested in laptops, their scores suggest they may outperform the M4 in raw multi-threaded performance, but actual power efficiency and thermal performance will be crucial for real-world comparison.
What new features does the M4 processor's GPU have?
-The M4 processor's GPU features a new 10-core design that builds on the M3's architecture, including dynamic caching, hardware-accelerated ray tracing, and mesh shading.
What are some possible future variations of the M4 processor that Apple might introduce?
-Possible future variations of the M4 processor could include an M4 Pro with more cores, and an M4 Max with even higher core counts. There is also speculation about the A18 chip for iPhones potentially incorporating Arm v9 architecture.
Outlines
💻 Introduction to Apple's M4 Processor in iPad Pro
Apple's M4 processor, introduced in the iPad Pro, boasts up to 10 cores, a significant upgrade from the M3's 4+4 core configuration. The new processor includes four performance cores and six efficiency cores, featuring improved branch prediction and a deeper execution engine for enhanced performance. The M4 also integrates next-generation machine learning (ML) accelerators, which are crucial for scientific simulations, computer vision, and machine learning tasks. Apple's presentation highlighted the performance improvements but did not make it clear that the full 10-core configuration is not guaranteed, leading to some customer disappointment. The processor operates at a boosted clock speed of 4.4 GHz and is built using TSMC's second-generation 3-nanometer process node. It's important for potential buyers to understand the specifications and storage options to ensure they receive the desired CPU core count and RAM configuration.
🔍 Deep Dive into M4's CPU Microarchitecture and Geekbench Scores
The M4's CPU microarchitecture has been redesigned with improved branch prediction, wider decode and execution for performance cores, and a deeper execution engine for efficiency cores. These changes enhance instruction-level parallelism and performance. Apple's first ARM v9 CPU, the M4 includes the scalable Matrix extension, which facilitates hardware-based matrix multiplication, beneficial for ML tasks. Geekbench scores reveal that both the 9-core and 10-core versions of the M4 outperform previous Apple chips like the M3, M2, and M1. However, when normalizing for clock speed, the M4 shows a significant leap in performance over its predecessors, indicating substantial microarchitectural improvements alongside the higher clock speed and new process node technology.
📊 Comparative Analysis with Snapdragon Processors and M4's GPU
A comparison with Snapdragon X+ and X Elite processors shows that while the M4's 9-core version is slightly behind the Snapdragon X+ in multi-threaded performance, the 10-core version of the M4 is competitive. It's important to note the core configurations differ, with the M4 having a mix of performance and efficiency cores, while Snapdragon processors have all performance cores. The power efficiency of these Snapdragon processors is yet to be determined. Additionally, the M4 features a new 10-core GPU, an upgrade from the M3, with dynamic caching, hardware-accelerated ray tracing, and mesh shading. Predictions for future Apple devices include potential M4 Pro and M4 Max variants with more cores and the possibility of the A18 chip in iPhones incorporating ARM v9 architecture.
Mindmap
Keywords
💡M4 Processor
💡Core Configuration
💡Branch Prediction
💡Decode and Execution
💡Efficiency Cores
💡Next Generation ML Accelerator
💡Geekbench Scores
💡Clock Speed
💡3-nanometer Process Node
💡Scalable Matrix Extension
💡GPU
Highlights
Apple announced the M4 processor, which is notably used in the iPad Pro instead of a MacBook or MacBook Air.
The M4 processor features up to 10 cores, an upgrade from the M3's 4+4 core configuration.
The M4 includes next-generation core features with improved branch prediction and wider decode and execution for performance cores.
Efficiency cores in the M4 have a deeper execution engine, enhancing performance.
Both core types in the M4 feature an enhanced Next Generation ML accelerator, which is crucial for machine learning tasks.
Apple's performance claims are based on Geekbench scores, which show improvements over previous generations.
The M4's core count may vary, with some models having 9 cores instead of 10, which could disappoint some users.
The M4 operates at a boosted clock speed of 4.4 GHz, regardless of having 9 or 10 cores.
The M4 is built using TSMC's second-generation 3-nanometer process node, improving efficiency and performance.
The M4's microarchitecture has been improved for better branch prediction and instruction handling.
The M4's efficiency cores may have higher clock speeds due to a deeper execution engine.
The M4's ML accelerators are part of the CPU and are based on ARM's v9.4 architecture, enabling hardware matrix operations.
Geekbench scores show a significant leap in performance for the M4 compared to its predecessors, even when accounting for clock speed.
The M4's GPU is an upgraded 10-core version from the M3, featuring dynamic caching and hardware-accelerated ray tracing.
The M4's performance puts it in competition with the Snapdragon X+ and X Elite, though power efficiency remains to be seen.
Future M4 variants, such as the M4 Pro or Max, could offer even more cores and enhanced performance.
The A18 in future iPhones might also adopt the M4's ARM v9 architecture, potentially offering an octacore configuration.
Transcripts
so Apple announced its new M4 processor
a couple of weeks ago interestingly
being used in an iPad Pro rather than
appearing in a MacBook MacBook Air or
something like that now I didn't do a
video at the time of the launch because
I didn't want to just repeat the
information that uh Apple were giving in
the presentation you know repeat the
press release I wanted to find out a few
things about this processor I found them
out and today I want to have a look at
them so if you want to find out more
please let me explain
okay so let's go through some of the
highlights of the CPU part of the new M4
processor the M4 has up to 10 core we're
going to talk about that up to 10 core
uh consisting of up to four performance
cores and six efficiency cores so that's
an upgrade from the M3 which was 4 + 4
it's got the next generation core
features including improved Branch
prediction with wider decode and
execution for the performance cores and
a deeper execution engine for the
efficiency cores we'll dive into that
what does all that mean and both types
of cores also feature inhanced Next
Generation ml accelerator now this is a
really really important point and we're
going to dive into that that's why I put
it here in red because we have to
understand that and then Apple are
quoting performance differences and
we'll look at the performance according
to geekbench okay so this is how Apple
presented the uh the new chip so four
performance cores with approved BR
prediction and so on six efficiency CES
giving 10 in total however we have to
notice it's up to 10 core you aren't
guaranteed to get a 10 core CPU and
apple doesn't make it obvious in fact I
know somebody who actually went online
bought the new iPad and then when they
got it with a nine core one they were
pretty disappointed they're actually
going to return it and order the next
one up now the good thing is that both
variants have a boosted clock speed of
4.4 GHz that's the same if you get the
nine core or the 10 core and both built
using tsmc second generation 3 nanometer
process node so what's actually
happening here when you do dig down into
the specifications you don't get shown
that on the page where you're buying
online but you get it if you do dig into
the specifications basically there is a
nine core CPU with three performance
cores and six efficiency cores if you go
for 256 gigs of storage 512 gigs of
storage and 8 gigs of RAM now that's
quite limiting because you're not going
to get you know 16 gigs of RAM and maybe
only 256 maybe you don't want more than
256 you know it's an iPad I know people
use them more like you know laptops and
desktops nowadays but you know this is
quite limiting was if you want the 10
core version you have to go with one
terabyte of storage and get the 16
gigabyt of ram so most people even if
they go with a half a terabyte of
storage and they're just going to have 8
gigs of RAM and the nine core CPU so do
beware when you make your buying Choice
okay so it talked about an improved CPU
micro architecture remember micro
architecture is how the chip is
internally designed each chip M1 M2 M3
the internals are different of course
it's still using it's an Arm based chip
so it's arms architecture but the micro
architecture what's going on the inside
is different in every chip so improved
Branch prediction this is for both calls
Well Branch prediction basically when
you have these pipelines and I've got
videos on this about this here on this
channel you've got these instructions
going down the pipeline they're going
through the decode phase they're getting
along the pipeline getting ready to
execute and then at some point they find
out that what in the pipeline is wrong
because a branch has taken the software
somewhere else and you have to empty the
the pipeline out and then start again
and that causes a performance uh blip
because you've had to empty out the
pipeline and start again so if you can
improve the branch prediction then you
basically you get better performance
fewer Branch mispredictions means
greater performance and that's what
Apple are saying they've done in both of
these uh cord designs now wider decode
and execution engines for the
performance cord well wider pipelines
means more instructions per cycle that
the CPU can fetch decode issue execute
and retire that's the whole kind of
outline of a modern super scaler uh C uh
pipeline uh and so what they're
basically saying is that there's a wider
decode that means more instructions can
be decoded so they come in from the
fetch stage and then they can handle
more of them at the front end uh more
some of them simultaneously and then
towards the back end a wider executional
engine allows for the CPU to execute
more instructions concurrently and that
improves instruction level parallelism
and that's what they've done in this new
processor now they haven't done that in
the efficiency Calles instead with the
efficiency Calles they've got a deeper
execution engine which basically means
there are more pipeline the pipeline is
longer and that probably means there's a
higher clock speed because you can bump
up the clock speed and each clock cycle
now is there's more clock Cycles
happening but you don't have to achieve
so much in each clock cycle because you
split it up into smaller chunks of work
so generally when you make the pipeline
longer you're actually bumping up the
clock speed so I'd imagine that these
new efficiency calls have got a higher
clock speed now here's the key ticket
enhanced NextGen ml accelerators now
these are in the CPU you go back and
look at Apple presentation look at their
diagram these are in the CPU they're not
part of the npu the NP is a separate
thing they talk about the npu separately
talk about its performance talk about
how it's better than previous npus in
the Apple range but this is in the CPU
now why what does it mean now to have an
ml accelerator in the CPU well what
we're talking about is the scalable
Matrix extension specifically the second
version of it which is basically how you
can do matrix multiplication in Hardware
of course we've come a long way from you
know back in the day when a even a
floating Point Unit to do floating Point
numbers was an optional thing and even
today they're optional in
microcontrollers and so on now we're
talking about not even Scala operations
uh but we're talking about Matrix
operations so you can actually do in the
CPU multiplication of matrices and other
Matrix uh kind of operations and that's
good for scientific simulations computer
vision and machine learning and this is
the point now that we've got these ml
accelerators which allow things to
happen in the CPU by allowing these
matrices to be manipulated multiplied
and so on now the key is that2 is part
of the arm
9.4 architecture okay so it's armv 9.4
and it's not available in arm V8 so this
means that this is a new CPU from uh
Apple they've redesigned both cores as
you can see because both cores have got
that uh approved uh uh Branch prediction
both Calles have had their pipelines
change and both cores contain the mlx
accelerators and this is what you do
when you change architecture both sets
of cores have to be the same because
that means that jobs can move from one
to the other and they can actually they
don't say oh no I haven't got the M
accelerator on this on this CPU core and
then it just dies they could the the
jobs can switch from the performance
cause to the efficiency cause cuz they
functionality wise they both do exactly
the same thing so this is Apple's first
arm v9 CPU okay let's dive into the
geekbench six scores I've got both the
single thread scores here and the
multi-threaded scores as you would
expect the single threaded score for
both the nine core version and the 10
core version is better than what you
find in the M3 in this particular case a
MacBook Air I wanted something with
passive cooling the M2 which you found
in the iPad Pro the M1 that you found in
the iPad Pro we can see the progression
here of the single threaded scores and
in terms of multi threaded scores the 10
core version is better than the nine
core version and we can see the 10 and N
core versions are better than the eight
core versions of the M3 the M2 and the
M1 as you'd expect and you can see here
how the M1 the M2 and the M3 have
increased that multi-threaded score
performance as you go up through the
generations however it's important to
note that each of these process is
running at faster and faster clock
speeds so the original M1 was running at
around 3.2 GHz whereas the latest M4 is
running at 4.4 GHz so there's obviously
be more performance when you crank up
the clock speed so if we now actually
divide the single threaded score by the
clock speed we see a very very
interesting thing so trying to eliminate
the clock speed or bring them all down
to the same clock speed what we see is
that for the M1 the M2 and the M3 really
the single threade scores were very very
close 747 743
753 and of course you know geekbench
scores go up and down depending exactly
on the person that run them these are
average scores that I've been looking at
here so really they're not much
different what really happened and I did
say this at the time of the launches of
these of these process what really
happened is they bumped up the clock
speed there were some changes they made
some microarch changes but really they
bumped up the clock speed went down to
different the next nanometer process
node they were able to boost the clock
speed and really if you actually look at
it the M1 the M2 and the M3 are really
very very similar however we can see a
very big leap when we come to the M4 so
even though it's running at 4.4 GHz
which in itself makes things will just
the performance will be greater we can
also see that the actual performance in
terms of the
microarchitecture is much better so
certainly the M4 is a leap uh in terms
of uh Apple's engineering that leaped
arm v9 they've redesigned the CPU they'
redesigned the pipe ples they've
redesigned the branch predictors and
they've up the clock speed because they
now on this second generation 3
nanometer and you've got this great
performance regardless of the clock
speed and and including the new clock
speed and of course we need to compare
this to the Snapdragon X+ and the
Snapdragon X Elite now note neither of
these processes are actually available
in laptops yet we're basing our numbers
on ccoms numbers and of course there
will be some real world testing coming
about when these laptops uh are released
some will have passive cooling some will
have active cooling some will be with a
plus some will be with the elite and so
on and I've covered this before in
previous videos here on this channel but
what do we we see uh M1 this is
multi-threaded score we don't really
have the single threaded score
particularly for the uh uh Snapdragon
X+ so we've got the M1 the M2 and the M3
here with their respective scores then
we've got the M4 with the nine core
version because this is the
multi-threaded one and that is just
beaten by the Snapdragon X+ with its 10
cores now of course as many people will
point out that the nine Core 1 is three
performance cuse and the rest efficiency
CES whereas the Snapdragon X Plus is all
performance caes so that does raise some
interesting questions you've then got
the M4 with a 10 core version and then
next you've got and he's beaten by the
Snapdragon X Elite with the 12 core
version again noting that the M4 with
the 10 core version is with performance
cores and efficiency Calles the
Snapdragon X Elite is with just
performance cores however we don't know
what the power efficiency of the SN
Dragon X Elite process are so if they
are able to pull off this performance
with a similar power enveloper thermal
envelope than the M4 does even though
it's got the two different types of
cores then that makes the Snapdragon ex
Elite an absolute winner if it does it
but it's actually very very power hungry
then that's going to be slightly
different situation however in terms of
overall brute performance in a mobile
laptop type of device then we can see
the Snapdragon X at the moment looks
like like the winner but we're going to
have to see what happens when they
actually come out and see what the
actual power performance numbers are
okay so just moving quickly to the GPU
it's a new 10 core GPU of the M4 Builds
on the existing Graphics architecture of
the M3 so basically this is a tweaked
version of the M3 for the iPad that
means you're getting Dynamic caching now
which is this way of allocating memory
dynamically so you don't just say well
here is you know 2 gigs of memory for
the GPU it goes up and down according to
the gpu's new usage you've got hard
accelerated raay tracing hard
accelerated mesh shading now all these
things we found in the M3 they're now in
the iPad we're going to see them uh
later on in other M4 devices but they've
now come to the iPad so the M4 GPU is
basically an upgrade from the M3 GPU and
this is how uh Apple presented that
during the presentation okay so quickly
overall we've got the M4 built on 3
nanometer from tsmc for you've got 28
billion transistors Hardware accelerated
raid tracing uh av1 uh the 10 core CPU
GPU you can see it all here so a
certainly for the iPad this is amazing
certainly for what we're going to see in
the future predictions well I'm just
guessing I don't have any inside
information I don't have any kind of
leaks to tell I'm just guessing here
there are of course going to be some M4
variance the M4 Pro now could that be a
13 or 14 core with five or six
performance Calles and maybe up to eight
efficiency Calles remember they've added
extra efficiency Calles to the base M4
so that would be interesting M4 Max
could be see even 18 to 20 cores with
again eight efficiency cores we we'll
have to wait and see how Apple stitched
these numbers together how they're going
to do this configuration and we are
going to of course see this CPU in the
iPhone the A8 I would guess now Apple
could leave it for another generation
they could carry on the a18 with what
we've got in the a17 that be the arm v81
I have a sneaking suspicion they're
going to bring this to the a18 so you
get arm v9 in the a18 but remember last
time they called it the A7 Pro so does
that mean we're going to get an a18 Pro
maybe an octacore for the first time
maybe a 3 plus 5 setup I don't think
they've got enough wins in terms of
power to make it a 4 plus4 setup because
those performance Calles are uh Power
hungry because of the performance they
offer so maybe a + 5 that will be
interesting and maybe we can get an A8
without the pro which goes back to the
traditional 6 core 2+4 I don't know I'm
just guessing but Apple clearly have
scope here to do all kinds of
interesting things both in terms of
MacBooks and in terms of the iPhone okay
that's it my name is Gary Sims this is
Gary explains I really hope you enjoyed
this video if you did please do give it
a thumbs up and if you like these kind
of videos why not stick around by
subscribing to the channel okay that's
it I'll see you the next next one
[Music]
تصفح المزيد من مقاطع الفيديو ذات الصلة
5.0 / 5 (0 votes)