Why you don't need to worry about scaling your Java webapp
Summary
TLDRIn this insightful talk, the speaker addresses common concerns regarding the scaling of Java web applications. They delve into the intricacies of load testing, emphasizing the importance of understanding real user journeys and infrastructure. By guiding through the process of performance testing, the speaker shares valuable strategies for developers to assess system limits, optimize application performance, and manage expectations effectively, ultimately aiming to alleviate scaling worries for Java web applications.
Takeaways
- π The speaker reassures that often, concerns about scaling Java web applications may be unfounded and shares insights on load testing.
- π The talk was inspired by common questions about application scalability, especially when launching new products or APIs.
- π° The speaker delves into the 'rabbit hole' of performance testing, emphasizing the importance of understanding how many users an application can handle.
- π The presentation covers the process of load testing, including the setup of loaders and probes to simulate user requests and measure response times.
- π οΈ The speaker introduces tools and techniques for collecting and analyzing data during load testing, such as CPU, memory, network usage, and profiling data.
- π The importance of visualizing data through histograms and flame graphs is highlighted to better understand the performance of the application under test.
- π§ The speaker discusses the need for detailed analysis of application performance, including business logic execution times and garbage collection logs.
- π The talk touches on the concept of expectation management, urging developers to set realistic performance targets based on user behavior and infrastructure capabilities.
- π The speaker shares an example from Stack Exchange to provide perspective on realistic traffic expectations for web applications.
- π‘ It is suggested that load testing should be an integral part of the development process, not just a pre-launch checklist item.
- π The discussion includes the challenges of database scalability, which is often a bottleneck in web application performance.
Q & A
What is the main topic of the talk?
-The main topic of the talk is about why you don't need to worry about scaling your Java web application and how to approach performance testing and load testing effectively.
Why did the speaker start the talk with a question about scaling?
-The speaker started with a question about scaling to relate to the common concern developers have when building new products or APIs, which is whether the application will scale under high load.
What are the roles of loaders and probes in the context of load testing?
-Loaders generate load against the server by sending multiple requests, while probes simulate user behavior by sending a lower frequency of requests and recording latencies to measure the performance of the server under load.
Why is it important to separate the roles of loaders and probes?
-Separating the roles ensures that the load generation and the performance measurement are isolated, preventing the loaders from becoming a bottleneck and providing accurate latency data.
What does the speaker mean by 'going down the rabbit hole'?
-Going down the rabbit hole refers to diving deep into a complex subject or problem, in this case, exploring the intricacies of performance testing and understanding the capacity of different server instances.
What is the significance of using flame graphs in load testing?
-Flame graphs provide a visual representation of where the application spends its CPU time, helping to identify bottlenecks and areas of the code that require optimization.
Why is it recommended to run load tests for more than 30 seconds in real-life scenarios?
-Running load tests for a longer duration helps to simulate real-world usage patterns and can reveal issues that may not be apparent during short test runs, such as memory leaks or performance degradation over time.
What is the purpose of collecting data from every participant in the load test?
-Collecting data from every participant, including the server, loaders, and probes, provides a comprehensive view of the system's performance, helping to identify the root cause of any issues that arise during the test.
How does the speaker suggest managing expectations regarding the load capacity of a web application?
-The speaker suggests using real-world data, such as the traffic statistics of well-known websites like Stack Exchange, to provide a realistic perspective on the expected load and to set achievable performance goals.
What is the importance of understanding real user journeys in the context of load testing?
-Understanding real user journeys helps to create load tests that mimic actual user behavior, ensuring that the tests are relevant and that the performance metrics collected are meaningful for the application's intended use.
Why is it not always necessary to choose the most powerful server instances for a web application?
-Optimizing application code and understanding its performance characteristics often reveals that less powerful, and thus less expensive, server instances can handle the expected load, making it unnecessary to over-provision resources.
How can developers benefit from being involved in load testing?
-Developers can benefit by gaining insights into the performance of their code under load, identifying inefficiencies, and learning how to write code that scales better and is more efficient under high-load conditions.
What is the potential issue with relying on health checks for load testing?
-Health checks may not accurately reflect the performance of the server under load, as they typically only verify if the server is up and responding, rather than measuring the response times and system behavior under stress.
Why is it challenging to scale a system with a database that is the bottleneck?
-Databases can be difficult to scale due to their stateful nature and the complexity of managing data consistency and transactions, making it a common bottleneck in high-load scenarios.
What is the speaker's view on the role of developers in load testing?
-The speaker believes that developers should be more involved in load testing, starting early in the development cycle, to better understand the performance implications of their code and to optimize it for scalability.
Outlines
π€ Introduction to Scaling Java Web Applications
The speaker initiates the talk by addressing the common concern of scaling Java web applications. They discuss the origin of the talk and engage with the audience to identify those working on web applications. The speaker shares their past experiences with pre-launch scaling questions and introduces the concept of performance testing, mentioning microservices, Project Reactor, and reactive programming as related topics. The session aims to explore the 'rabbit hole' of performance testing and load testing, with a focus on learning how to determine the load capacity of different Amazon EC2 instances for a Java application.
π¬ Setting Up a Load Testing Environment
This paragraph outlines the process of setting up a load testing environment using Amazon EC2 instances, embedded Jetty servers, and business logic simulating a tax rate endpoint. The speaker introduces the concept of 'loaders' for generating requests and 'probes' for measuring latency, emphasizing the importance of isolating these components to avoid skewed performance data. The paragraph also describes the initial load testing setup, including the number of requests per second and the duration of the test, and mentions the use of custom code for orchestrating the test.
π Analyzing Load Test Results and Collecting Data
The speaker discusses the importance of analyzing load test results by collecting data from all participants, including the server, loaders, and probes. They highlight the need to gather operating system data such as CPU, memory, network, and I/O usage to get a comprehensive view of the system's performance under load. The paragraph also introduces the idea of taking detailed snapshots of system performance metrics during the test runs and the use of flame graphs and profiling data to understand the application's behavior during load testing.
π Interpreting Test Results with Visualizations
In this section, the speaker explains how to interpret load test results using HDR histograms for visualizing latencies and understanding the performance of the server under different load conditions. They discuss the significance of examining the histograms to identify any performance bottlenecks or unexpected behavior in the system. The speaker also mentions the importance of checking the CPU logs, memory logs, and network logs for all participants to ensure the reliability of the test results.
π Deep Dive into Flame Graphs and Profiling Data
The speaker provides an in-depth look at flame graphs and profiling data, explaining how they can be used to identify where the application spends its CPU time and memory during execution. They discuss the importance of understanding the call graph and garbage collection logs to optimize the application's performance. The paragraph emphasizes the need for developers to get into the habit of analyzing these detailed performance metrics to find areas for optimization.
π Moving Beyond Absolute Numbers in Load Testing
The speaker emphasizes the importance of moving beyond just looking at absolute numbers in load testing and focusing on real user journeys and infrastructure. They discuss the need for expectation management when dealing with business stakeholders who may have unrealistic expectations about the load capacity of the system. The paragraph also introduces the idea of using real-world data from websites like Stack Exchange to provide a realistic perspective on the scale of load that web applications typically handle.
π οΈ The Importance of Developer Involvement in Load Testing
In this paragraph, the speaker argues for the importance of involving developers in the load testing process early on in the development cycle. They discuss the need for developers to understand the system's performance implications and to take load testing seriously to avoid issues at launch. The speaker also touches on the challenges of load testing in a containerized environment like Kubernetes and the importance of having a clear understanding of the application's performance on different hardware configurations.
π£οΈ Audience Q&A and Final Thoughts
The final part of the script includes a Q&A session where the audience asks questions about load testing, database scalability, and the role of developers in performance optimization. The speaker shares their experiences and thoughts on these topics, emphasizing the complexity of scaling databases and the need for developers to be proactive in performance testing. The session concludes with the speaker's contact information and an invitation for further discussion during the break.
Mindmap
Keywords
π‘Scaling
π‘Java Web Application
π‘Load Testing
π‘Microservices
π‘Project Reactor
π‘Amazon EC2
π‘Performance Testing
π‘Jetty
π‘Flame Graphs
π‘Garbage Collection
π‘User Journeys
Highlights
Introduction to the topic of scaling Java web applications and the common concern of whether a new product will scale.
Engagement with the audience to identify who is working on web applications.
The dilemma of being asked about scaling capabilities just before a product launch and the common questions about handling large loads.
An overview of various AWS EC2 instance types and the challenge of determining which is suitable for a Java application.
The importance of performance and load testing to understand how many users an instance can handle.
Introduction to the concept of loaders and probes in the context of load testing.
The significance of not overloading loaders during performance testing and the role of probes in simulating user requests.
A step-by-step process for conducting load testing, starting with a simple server setup and gradually increasing complexity.
The use of custom code to orchestrate load testing across multiple machines.
The process of analyzing server performance under different load conditions by collecting data from the server, loaders, and probes.
The importance of collecting detailed data such as CPU usage, memory usage, network I/O, and business logic execution times.
The use of flame graphs for visualizing where the application spends its CPU time and identifying bottlenecks.
The concept of expectation management when dealing with business stakeholders and setting realistic load handling expectations.
A discussion on the practicality of load testing and the difference between load testing and limit testing.
The idea of not needing the most powerful instance for an application and the approach of scaling down in testing to find the optimal instance type.
The conclusion emphasizing the importance of understanding user journeys, real infrastructure, and proper load testing to ensure scalable Java web applications.
A Q&A session discussing practical aspects of load testing, such as handling unexpected loads, memory usage during testing, and the role of developers in load testing.
Transcripts
thank you Katya hi folks good evening
together uh what's up this I guess
that's how how we say it
um my talk tonight is why you don't need
to worry about scaling your Java web
application
how did this talk come about and by the
way who is working on a web application
in one form or another
right thankfully because I gave this
talk to a couple of folks here Japanese
they almost almost worked on text
editors and whatever and they couldn't
really relate to what is going on here
in this talk so okay
now I don't know about you but in the
past I had this
these situations you build a new product
you build a new API shortly before
launch a product person comes in and
asks you hey Will We scale will this
thing scale
what happens if we're going to be the
next Amazon tomorrow what happens if a
hundred thousand people simultaneously
call our rest API we'll be just you know
burst Up in Flames
and then you can Mumble something about
microservices and project reactor and
reactive programming and Yuri is
actually going to talk a bit about that
later on about effective programming
but I wanted to go down the rabbit hole
what is the rabbit hole well the thing
is I thought let's not you know take the
shortcuts and uh do something
containerized where some assist admin
says well my app only has 0.5 CPUs
I wanted to have a look at the Amazon
ec2 instance type web page
and what you can see here is so I'm on
the AWS page I can see on the left
general purpose instances
and I have from t4g T3 t3a up down till
A1 instance types
I can click compute optimized C7 G GN C4
memory optimized you know I can click
through all these categories and I'll
see plenty of instances with different
letters basically every letter in the
alphabet
and I thought well if I have a Java
application and I put it on any instance
how would I know how many users that
instance basically could uh it could
handle how would I know which instance
type to get I mean do I need r5n or do I
need z1d
and that left me down the the rabbit
hole of let's say performance testing
and a big part of the talk is on load
testing don't worry I know that most of
you have been probably doing some load
testing at one point or another meaning
you click around in J media you click
around and get link something like that
I want to
show you
a process of how to approach
the question well uh which load can my
my instance handle actually the whole
process and I want to teach you a couple
of contents along the way that I learned
from the chatty team actually because I
got together with this talk with the
with the jetty maintenance The Jetty web
server I asked them because they seem to
have a lot of knowledge around high
performance Jetty instances about
handling many users and I'd just like to
share a couple of Concepts
all right now that being said what you
can now do is kind of enjoy my
handwriting
um because I rarely handwrite
apart from this presentation and let's
see what we're going to do now
the the setting I already drew a couple
of things what you can essentially see
is what I want to do is I want to have a
server at the moment and I'm going to
make it very simple we're not going to
start with microservices we're not going
to start with a database we're not going
to start with 20 000 different Services
let's keep it simple at the beginning
and we can make it more complex later on
the server is just a machine on ec2
the type doesn't matter for now you can
do it with any type it runs an embedded
JT because it's simple to set up and it
has some business logic where you can
see here the BL the business logic
is
remember we're working for a boring
company calculating tax rates
so it only has one endpoint at the
moment a tax rate endpoint and the tax
rate endpoint gives you back stuff like
for Germany named e rates 19 something
like that
one end point it's a bit unrealistic at
the beginning but we'll have to learn
all the concepts first and then you know
make it more complex later on
what we also have is the concept of
loaders well what are loaders
essentially these loaders they shoot
requests against the API against the tax
rate endpoint
and you could have one loader you could
have many loaders I have four loaders
here and why do we have four loaders or
n loaders because you have to make sure
the loaders themselves are not
overloaded when they basically do
performance testing against your server
which can happen relatively soon
actually with more than you know beefy
servers that your loader is the problem
not the server is the problem
and then obviously you want to isolate
you don't want to do what a lot of
developers do on their own machine you
know put up the server the loader run
everything in one machine and then just
have some funky numbers which are
totally unreliable
what I also have is a concept of a probe
what's the difference between a probe
and a loader
the probe is essentially so the loader
the only trouble loader has is
generating load against our endpoint
nothing else it doesn't record any
latencies whatever the probe itself is
one machine which just does like you
know one two requests per second
something like that simulating a user
going against the server so the probe
doesn't spend much effort generating
load it just you know browses for
example the website or hits the
endpoints in this case and you get
isolated results on the latencies that
the uh the probe gets when it hits the
server endpoints
so what we're going to do is just as a
big as an overview we have these loaders
we have the server we have the probe all
machines in Amazon ec2
and the loaders are going to send later
on you know many requests per second
against our server the probe sends one
uh one or two requests per second
against the server
zooming out a tiny bit the process will
look like this let me just see
if that works
different color right
we're gonna start let's say
the loaders are gonna start with 250
requests per second against the server
that makes it uh 1000 requests so we can
read that 1000 requests uh in total per
second against the server and we're
going to let our tests run for 30
seconds in real life it should actually
run for longer than 30 seconds but for
now I want to do everything live I don't
just want to bombard you with numbers
ideally I want you to go home and just
repeat you know what I did here just
with some live coding essentially all
right so that is Step number one and as
you might have guessed I prepared a tiny
project which is not a jmeter which is
not Gatling it is a bit of custom code
just scrolling down here which you can
ignore for now so I'll scroll up again
and what it essentially does is it
orchestrates these six machines
here at the top I have these 250
requests per second that every loader
shoots against my endpoint and I'm just
going to run this class now because it
takes some time obviously and I want to
show you the the Real Results this this
test run got later on
and I have to do some switching back and
forth during this talk between the IDE
and my drawings
all right so while this runs
let's open it up again
um what's going to happen is we are
going to start with 250 requests per
second then let's try 1000 requests per
second for example uh times four so we
were going to shoot 4 000 requests per
second
and at some point whoops that was not it
let's see uh we're gonna do the same
thing we're going to do a couple of test
runs maybe with 5000 requests per second
RPS per second
so that we have 20K in total
and then maybe with 10K requests per
second
times for 40K
and what we want to do is we want to
find out once our server breaks down
when does our server have problems
handling the load that is Step number
one
step number two is so far I've been
bending you know a fair amount a bit a
fair amount of you know pretty standard
load testing
we want to make it a bit more complex
because just you know sending off
requests against the server doesn't
really tell us anything it just gives us
some weird numbers we have to make sure
these numbers are reliable what do I
mean by that well
we need to collect data from the server
from the probe from the loader from
every participant in the load test what
kind of data
let's collect some operating system data
for example let's find out what is the
CPU usage of my server of the loaders
what is my memory usage so memory usage
of all these participants what is my
network adapter usage for example
what is my i o usage so I want to get a
I call that actually a big picture
so the operating system is the big
picture
I want to find out when I increase the
load against my server
how does my operating system behave not
just from a server from the loader from
everyone I want to find out hey did my
loader maybe have CPU problems
generating the loads was the network
problem with the server so is the
problem actually that the network
bandwidth is too little or do we have a
CPU problem do you have a memory problem
just a big picture of what the problem
actually is once I start generating more
load very important step number one
step number two big picture data gives
us only you know the general glimpse of
what's happening in our load test which
means we need to get the detail picture
also
detailed picture
what is the detail picture
and by the way let me just quickly have
a look at my load test that hopefully
finished yes it finished already so I'm
gonna spawn off a new load test
hopefully you have Dimension capacity
I'm just going to go back I just
executed the test with a thousand
requests per second I'm going to do
another one with 1250 requests per
second per loader
so we have these results in a second
also
while I keep babbling okay let's rerun
this
good that looks good
going back here detail picture
detail picture means I want to get
for example I want to get the latencies
or the execution time of my business
logic so I want to find out
over here in my server excluding the
chatty web server everything else I just
want to know how long does my business
business logic take so for example how
long do my SQL queries take how long do
all my workflows take I want to get
detailed numbers on that regarding when
I do the load test
then you know just numbers like that
don't help me a lot I also need to kind
of cross that them with flame graphs or
profiling data
so I want to find out where does my
application spend its CPU time where
does my application spend its memory
once it executes certain you know uh
call graph essentially I also want to
get data on maybe the garbage collection
logs
garbage collection logs or um
hiccups whenever the jvm Stalls because
uh
during your garbage collection Cycles
not much is happening right and you want
to have a good understanding of your
garbage collection box for example
picture detail picture you need to get
that for every test run that you're
basically executing
the good story is this Java class I
showed you early on what it actually
does is it not only goes and sets up
these you know six machines
it runs the tests it also makes sure
during every test run to make snapshots
for every participant of the CPU memory
Network i o and make sure to double
check the the business execution time
business logic execution time the flame
graphs profiling data garbage collection
logs for the server
which means part number two of this talk
is having a look at this data and trying
to figure out what that data actually
means and what we can do with that data
let me go back
um now when people are thinking dude
this is all on your machine the data is
handily available if I'm using AWS I
could use cloud watch I could use
Services I wanted to make the talk
vendor independent so all the data
you're going to see in a second we
basically get it from Linux command line
tools on an ec2 instance and yes there's
many other locations where you can get
the data from
what my test does is and this is just
because I built it like that I get a
results folder
in my Maven Target directory but it's
just because I put it that way and by
the way I just saw this load test also
finished let me just quickly run another
load test with 5 000 requests per second
so now we're shooting 20 000 requests
per second against my machine
right what data do we get we get a
folder which is called plot let me show
you there's a plot HTML file inside let
me open up the plot
what you see here is an HDR histogram
and then we can debate now later on I
have had this discussion many times if
this is indeed a histogram or histogram
should look different
it's an HDR histogram
what you see here is well at the moment
you see two lines a red line in the blue
line the Blue Line corresponds with my
thousandth requests per second that I
shot against the server the red line
with my 5000 requests that I shot
against the server these lines are the
latencies that the probe gets whenever
you know it says Hey tax rate tax rate
tax rate tax rate
on the left side you see the latency for
the probe in microseconds
confusingly you have to divide it by a
thousand and you get the millisecond so
actually all these calls that the probe
did against the server
um they were like one millisecond 1.25
milliseconds 1.5 milliseconds that you
can see here on the left
the right Axis is the percentile meaning
when you go here like here 90 percentile
tells you ninety percent of the requests
were done faster than
1294 microseconds or 1.2 milliseconds
that is actually not that important for
now the percentiles but you just get you
know a nice little line for the test run
and you immediately see hey do I have
like these uh small latencies do I have
some huge bumps inside were there
problems it's just a visual confirmation
of what happened during your test run at
the moment which we just sent a thousand
requests per second 5000 requests per
second we don't expect there to be huge
bumps whatever so these lines are pretty
much flat at the moment
all right step number one have a look at
these histograms or generate such
histograms have a look at them
then
data overloads and I can promise you
once you make we make it through these
folders here uh then uh you did your
work for for tonight
so what I did is for every test run we
did we get a folder like this one at
1822.
you see subfolders for loader number one
loader number two loader three load of
four because we have four loaders
you see a folder for the probe and you
see a folder for the server
let's have a look inside what each of
these folders has contains
by the way nope because the test run
just let me just quickly double check
one more test run sorry for the
switching but I just want to get some
data before we talk through this
so now we're sending 30 000 requests per
second against my ec2 machine
something like this okay
back to our loader we have as promised
uh data for the CPU usage we have data
for the memory usage we have data for
Network usage and we need to make sure
this is what is skipped always
to check these files or the data or the
visualizations for every participant of
the load test it's quite boring but it's
something that you have to do
my CPU log by the way this is quite a
beefy Lola because it has eight CPUs
which you can see here
um I just take a couple of so the way
this works is
the load test runs for 30 seconds every
10 seconds I take a snapshot so you'll
find three tables in here with the CPU
usage data and to not overload you we're
just going to have a look at the first
row here the all row which which is the
average row and when you go to the right
you can see that on average the CPUs
were idle let's Round Up 99 of the time
so they just did work for one percent of
the time which is expected because you
know it's a pretty beefing machine and
they generate 250 requests per second
which is literally nothing for such a
machine
right you double check that make sure
everything is good and you go on to the
memory log
memory log very uh very similar just you
know snapshots here at the table every
10 seconds you get a new snapshot we can
see the loaders have 30 gigabytes of
memory they use 450 megabytes of memory
they still have a lot of free memory
again the big picture doesn't tell us
anything here but for this
simple test nothing else was to be
expected
Network log it's my favorite log because
it has so many columns and tables
what you have to understand is that two
tables now represent one snapshot and
the only numbers you're basically
interested in are these receive
kilobytes per second and transmitted
kilobytes per second
and then you have to divide these
numbers by 125 I think to come up with
the megabits like 100 megabits or you
know a gigabit whatever and again you
know sending just a bit of Json around
for this load test doesn't mean that our
network adapter is the problem by the
way what I learned in this when I
prepared the talk so you can I mean
sooner or later you will max out the
network adapter actually that the
network adapter is the problem nothing
else is the problem for the server if
you just get enough loaders you know
generate some loads what I didn't know
is that Amazon ec2 doesn't only cap
the whole bandwidth like a 100 m bit or
a gigabit they also have a limit on the
number of TCP packages every second you
can get and funnily enough you will hit
that limit earlier most of the time than
your bandwidth limit it's quite funky
but just keep it in mind if you ever
have such an instance where suddenly
Network packages are being dropped
all right
we had a look at that at these operating
system data while we have another test
run that's completed let me just have a
look at our plot I'm switching back to
the plots let me see what that looks
like
all right the plot now has four lines it
got updated because we had four test
runs and we have a new let's the green
line here it's the 30 000 requests per
second and you can see that for 30 000
requests per second our probe 99 of the
time almost had requested to 1.5
milliseconds so again nothing
and then you have an odd request here
that starts taking like three
milliseconds for example so that could
be or could tell us maybe the server you
know has some some issues maybe not but
keep in mind we're still talking one
millisecond three milliseconds two
milliseconds we would need to figure out
what the actual is that a problem
actually or is that just normal
all right while this is here let me just
shoot off another
test run with 10 000 requests per second
so now we have 40 000 requests per
second let's see what happens in that
case
and while their test run is running
um some more data so just to keep your
mind have a look at CPU memory Network
HTTP client status is log also something
I mean it goes without saying the thing
is when you hit rest apis you want to
find out during a load test where all my
calls successful did they have for
example 200 this file is just a very
simple visualization of the fact that my
loader at second zero of the test or at
second one of the test or second two of
the test sent 250 requests and they all
came back with the 200 status code so
that is just another you know quick way
of visualizing hey where all my requests
sent by my HTTP client did they come
back with an okay HTTP status code also
something that can easily be skipped but
sometimes skew the results because you
think wow everything was super fast and
then you find out well my calls actually
all failed something like that
um what you also want to do is to give
you some even more data please ignore
the upper part of the screen here
um it's very nice with these low tests
to have uh throughput lines throughput
lines meaning you want to have pretty
straight lines that describe that hey my
loader actually sends 250 requests every
second to the server in just a nice
little straight line there weren't any
big ups and big Downs which you'll see
later on with the server when handling
too many requests you just want to make
sure visually at the beginning hey was
everything a straight line was
everything as I expected it to be
okay so we have that and then we make it
really short because we're not going to
double check for the for the other
loaders here we're also not going to
check for the pro because the probe just
sent one or two requests a second
against the server let's have a look at
this server
the server um by the way let me just
quickly have a look at if my new test
run finished no it's running it's fine
um the server has the same data
operating system Network memory uh log
which we can ignore for now it also has
a nice little file profiler CPU flame
graph let me open that up quick question
Who has worked with flame graphs before
who knows how to handle train graphs yes
okay do you mind explaining how this
works
I'm just kidding
um so to give you a very one-on-one
overview of what you what you see here
you see a couple of spikes you see
different colors
green the green color is our application
code so our Java code our servlet for
example that we wrote in our embedded
chatty
the red color the yellow color and the
orange color is actually time in this is
a CPU flame graph so we want to find out
where was actually the time spent
cpu-wise you know handling our HTTP
requests
and the red orange and yellow uh colors
are native codes uh kernel codes and jvm
code jvm code is actually C plus plus
the yellow Code Orange is Kernel codes
native codes operating system code is
red and we can ignore that for now but
you even see with these flame graphs
where does my operating system spend
spend this time where's time being spent
in the jvm for example
we want to have a look at the green code
the green bars which is our application
code
the chatty server is essentially to make
it really simple just a thread pool has
many threads handling requests so most
of the time our application spends its
time in thread run you can see here at
the very bottom hopefully that fret run
takes up most of the CPU time here
and then we need to find our servlet and
Ira no because I've done this a couple
of times that might you can search
inside this flame graph but you can see
here this tiny bar up here this should
be my servlet that actually handles the
request right so I can actually I can
visually see here at this time that time
spent in my own business logic is very
very little compared to everything else
that's going on
and when I click the methods I can even
see here inside my plane's JavaScript to
get method where is time being spent in
my plane servlets uh in my in my server
methods and by the way I should have
probably shown you the servlet because
it's so simple
just as a quick
right so my sublet does nothing at the
moment it has a to get method
and it creates a bug with object a text
rate object
the textured object has always the name
German v80 with a random number so every
time you call it you just you know
generate a random number and you get a
random double back and then you write
the thing directly to the HTTP server
response so that's all it does generate
an object a random number and write it
to the the server response
when you look at my flame graph you can
see here the bar says well most of the
time I mean the time being spent in the
plane servlet do get method
half of it roughly is generating the
random number here that is see that the
other bar you see Run next Double and
the other half of it or 40 maybe is
spent on writing actually to the HTTP
servlet response now I know this is a
very contrived simple artificial example
but what you will need to get in into
the Habit office when you have do these
load tests execute these or generate
rather these flame graphs for your CPU
for your memory usage and you can dig
down find out which methods of your
program spend the most CPU time spent
the most memory and then you can
optimize accordingly if you want to do
that
that is just a quick one-on-one into uh
flame graphs and how to handle them
now let me just quickly go back let's
see what my last load test did
if it finished
right it finished let me have a look at
the plot again because I want to see
what happened now
if 40 000 requests are oh
all right so what what happened is we
send 40 000 requests a second against
the server and now response times look a
bit different and
there's a couple of very important I
mean notes to make here first of all we
can when I scroll down here I can see
that well 40 percent of the requests the
40s percentile is still done in 1.5
milliseconds and then when I can go up
here and sometimes 95th percentile is
120 milliseconds so suddenly the calls
all go through they complete
successfully but suddenly they take a
120 times more than you know a regular a
regular call
what you can note however is you know
just a second ago we thought about oh
three milliseconds that is really a lot
in in the visualization now you can see
that these lines that we had earlier
pretty much they're all the same thing
it's just there was no difference
essentially between one or two or three
milliseconds but now with 40 000
requests a second things look different
and we suppose there might be a problem
somewhere along the line and we'll have
to double check with the data to find
out if the data matches this screen here
um to do that by the way let me just
quickly double check what I forgot to
show you Archer hiccup well the thing is
what I talked about garbage collection
logs J hiccup is
a Java agent which allows you to find
out when your jvm stalls and when there
were pauses in in your jvm I'm gonna cut
that out because analyzing garbage
collection logs and the Stalls is also
you know another huge topic but just to
keep in mind this is also something to
have a look for when did when when where
my GC passes and and that sort of stuff
um and then last but not least for the
server also we generate
um you know these visualizations right
with the throughput line that you can
see down here also
this is our very first test run a nice
little straight throughput line except
for the end here when my tests stopped
but that has nothing to do with uh with
the handling requests
all right so we want to have straight
lines we want to have a look at all of
this data let's find out what the data
looks like for our last test run
right and I'm going to make it let's see
so this is hopefully 1835 this should be
right now
let's have a look at the server straight
away
the CPU log
so when I scroll to the right here in my
server CPU log what I can see is I did a
couple of snapshots and at the beginning
you know the very first snapshot uh it
all looked still okayish kind of then
you know my CPU is worth 50 of the time
at the end here they worked 93 of the
time so meaning that my CPU was
basically almost at its maximum and that
is not a good number 90 of you know CPU
usage
uh that corresponds with the picture but
I also should find that out before
actually having a look at the picture
and just you know trying to uh not
what's it called put the horse carriage
before the horse or the other way around
okay forget that
memory wise what we can see is memory
wise our server
15 gigabytes memory in the snapshots 1.6
gigabit 1.1 gigabit two gigabits
actually uh gigabytes
um what actually it's interesting that
for every snapshot it increases by 500
megabytes so that is also something to
have a look at where does why does the
memory usage increase so much
Network let's have a look yes these look
like real numbers now uh what you can
see is uh suddenly receiving kilobytes
per second and transmitter kilobytes per
second so 20 000 divided by let's say
100
not quite right but let's say 20 M bits
that we're sending through the network
adapter which is still far away from a
gigabit or 100mbit but certainly the
traffic is is picking up
the profiler right and then let's have a
look at our throughput line because
hopefully our our throughput line also
looks different whoops that was the
wrong button let's see
right
and that's what I meant before if your
throughput lines suddenly for the server
start looking like this well you don't
have a straight line but suddenly you
have something which looks completely
random uncontrolled and then it goes
down it goes up again and the server
just handles any amount of requests it
can do it also tells you you have a
problem and the problem is
what we've been doing so far and this is
um one of the points I wanted to hint at
we always want to find out when does our
server break down
and that is not load testing it's
actually limit testing it's almost like
you know you have someone who wants to
jog and then you ask him hey how much
can you jog and when do you just fall
over and die
and usually you don't want to have that
someone dies what you want to do it
someone jogs at his Pace kind of and
then you know he just keeps jogging and
then he can stop again and then the next
day he can start walking again and
running again and it just doesn't you
know fall over and die
and the big renovation here is that you
don't want to find out uh Hey 50 000
requests 100 000 requests ten thousand
requests you want to find out what can
my server handle with the expected load
that my application is actually supposed
to get not just some vague big numbers
which means what we need to do is we
need to complete our picture a tiny bit
here
let me see
because so far we've been
simplifying things
one two three what are the problems with
what we just did the first one I already
mentioned we had a look at these
absolute numbers
so
um we said hey 50 000 requests that is
good
the funny thing is in the real world
what matters obviously are user Journeys
you want to find out what do your users
do on your website for example so if you
have an e-commerce website you don't
want just want to load test the password
reset API you want to find out sure I
mean my user brows is the catalog and
then he buys something hopefully they
buy a bit less than you know just
browsing around and you need to make
sure what are these real user stories
and that is you know actually the real
user the real user Journeys whoops let
me just write it like so
um most of the time you'll need to spend
figuring this out if you have historic
data I mean obviously that's great and
you know that on Black Friday you get 10
000 users coming in using a software if
not you have to take a best estimated
guess
the second part is real user Journeys we
also have to have real infrastructure
meaning with real infra
so far we tested against our server
no database involved obviously you also
need to get a database involved you want
to do this against single machines you
want to do it against your whole entire
microservice landscape and I can tell
you from experience that you know you
just saw how Dreadful it kind of is to
work through the data for one single
machine and then doing it through an
entire microservice landscape is quite
another thing kind of but yes that's
what you have to do I'm not going to
show it here because the process is
pretty much the same collect all the
data for all the participants make sure
the data makes sense the visualizations
make sense and only then increase the
load or change something about your
system
number three
real user journey is real infrastructure
I've been talking about absolute numbers
and they don't really tell us anything
there is however something I want to
make these let's call it expectation
management
expectation management
so your business people come in and
obviously they say hey tomorrow we're
going to be the next Amazon and we need
to be able to handle a 10 000 requests
or when you have some upload service and
usually users upload two megabytes of
files whatever they ask you hey what
happens if suddenly every user uploads
20 megabytes of files and that's because
they always have this limit testing in
the head which you already saw maybe we
shouldn't do
still expectation management I can tell
you all day long Marco can tell you oh
you don't need to worry about that and
people want some hard numbers
let me show you something
um what I personally found interesting
is
the second stack exchange homepage
because they published their performance
numbers they have stack exchange is
basically uh the company or the site
hosting stack Overflow everything like
that
I think on the old Alexa page before it
went down it was one of the top hundred
most popular pages on the internet
when you fall down here you can see that
stack Exchange
has to host all of that they have nine
web servers maybe geographically
distributed I don't know
and they say that they usually handle
handle like a 300 requests per second or
maybe peak of 450 requests per second
now I'm not saying that my servlet that
you just saw handles all of stack
Overflow what I want you to do is just
relativize the numbers you saw earlier
that if the biggest one of the biggest
websites in the world has maybe a 300
requests per second
then no matter how cool your startup is
or the company or your product people
it's unlikely that you're going to
launch the thing and you have 10 000
requests per second like ten thousand
ten thousand ten thousand ten thousand
because then you're gonna be working for
Google or some other or alipay like the
Chinese students which have crazy crazy
numbers but usually there's not that's
not what's going to happen for you most
of the time people I I would say could
be happy with having tens of requests
every second like consistently I know
there are some iot projects whatever
where you have crazy numbers and whatnot
but still 10 15 50 100 requests per
second is already a whole lot just to
put things into perspective of what you
saw earlier hence what I did with ah
let's impress the people let's show them
some random ec2 instance shoot off 50
000 requests it's nice but it's not kind
of realistic I mean what kind of project
is that it to hitting one endpoint no
user Journeys no real infrastructure 50
000 requests is nice though it's it's a
fair amount and it's much more than
you'll ever need to handle with your
application
and what I want to do is just to
summarize all of this I didn't give you
a perfect the perfect instance type you
could choose for your own project
in fact I don't have a great number I
did this scenario with many many
different instances hitting the network
packages limits for example as I told
you earlier and my own web applications
most of them they can always you know
you can basically always choose the the
cheapest option not the biggest or the
most beefiest options you never you're
never gonna need that because you always
find out that well maybe your
application code just have like 10
second passes and whatnot you can always
optimize and it doesn't make sense to
always go for the the beefiest option it
makes sense to run your tests and if you
know just the 10 users are happy re-run
a test against 20 different instances
with the always cheaper option you know
go cheaper and cheaper and cheaper with
every test run
um and then you can tell your boss by
the way that he can just give you the
difference from the ec2 bill and you can
you can put it into your own pocket at
the end of the month because he saved
the whole company so much money
if you do the load testing process and
collect the data interpret the data and
also do some exploitation management and
push back a bit against product
management which is hard at times then
hopefully you won't need to worry about
scaling your Java web application
anymore in the future
and that is I think all I wanted to show
you today
thank you
I forgot any questions
yes please
the numbers
but what happens
yes my experience example
mistake with the price
zero
few minutes or after 100 or 100
of the question of how to test
such that perhaps a short term some
minutes or some hours but
yes
so did everyone hear the question kind
of
yes okay it's actually a good question
so the thing is is there anything I
don't want it there to be lost it makes
sense with your Hardware to just once
try it out what happens when 10 000
people tomorrow come to my restaurant
for example I only have placed for 10.
then obviously you will have a problem
one way or another are you absolutely
right so it doesn't actually you can try
these experiments also with some crazy
numbers usually I would hope that it's
not just misconfiguration but it might
be Black Friday for example where you
know that you have a hundred times the
load throughout the year so it's kind of
pre-planning uh pre-planning if it's
really unexpected and really really
um you have a hundred thousand users
suddenly just then just good luck I
can't help you with this talk uh yes
yes please
yes
so the memory users of the server
increased with the number of requests
uh what's the reason because the server
was not doing
anything right just a random number
generator yeah it's a good question I
mean I guess some objects have been
created I could just be that Jetty blows
through all this memory with the request
handling but I can't tell you you'd have
actually have to figure it out yourself
it's one of these the thing is the talk
is really just the beginning of of a
huge rabbit hole because for all of
these things like for example where does
the memory get wasted whether specific
waste it you can spend hours days weeks
essentially to figure out why that was
the case so for that specific use case I
don't know I would have to check it up
and I can tell you that from experience
with Jeff brains in our teams we have
performance engineers and their only job
is to figure out this stuff kind of and
it's not something that just takes you
two seconds but you'll actually have to
do a lot of digging to figure out what
is going on why is it going on uh that
that's the real so this is just the
beginning of the rabbit hole and maybe
the answer to your question is is the
next step into the rabbit hole
yes
yes please
I didn't quite hear but so about
childhood Communications and the other
applications
just the pure backhand
with which don't have front and which
communicates with other systems when you
come check the performance it's the same
approach or is there anything yes so so
when I said web application for this
talk actually uh I didn't just mean you
know websites or whatever so it could
just be system to system backends
um I just meant that I gave as I said I
gave this talk to some people here
working mainly on desktop editors and
then they said well that is nice but how
does it you know correlate to what we
kind of do here so I just wanted to say
single user applications are not you
know are not the the target for this
talk but everything that you know runs
somewhere web server could be system to
system or back-end whatever it's the
same approach across all these systems
yeah same approach
yes please
yes
I mean the thing is what you saw with
the the probe essentially gives you the
latencies we saw you is the whole round
trip I mean for the probe uh to the
server and back again and then just as
additional data you'll ideally want to
find out how long does just my business
logic also take yeah how long does one
workflow for example take it's just um
so it's not either ours but it's
actually both you want to have the full
round trip latency and also a good
picture of what the business logic does
yes yeah yeah
yes
so we can say that probe is like health
check and let's suppose at one moment uh
the rest of the servers the load
balancers they are taking in more memory
and they are saying that actually
more requests are coming but for the
health check it's actually just to
requests so it will always say that your
server is actually fine
so yesterday I mean hopefully it says
the server is fine because the health
check would imply you just check for a
200 coming back what the probe does is
the probe needs to tell you how do my
latencies change the server so what you
actually saw in the graphs is that the
probe tells you hey suddenly my requests
take 100 milliseconds 120 milliseconds
so it's not just a health check but they
actually have to do the recording of
their request data so there's a
difference yes that's that there is a
difference yes so the probe don't so the
loaders they just with the Lotus you
have to make sure and this is by the way
also something
uh that you need to double check that
you loaders consistently when you tell
them send 5000 requests every second to
the server that they really do that kind
of yeah not just like one second four
thousand five hundred and then the other
five thousand but it must be five
thousand five thousand five and forget
kind of
the probe you know just one request but
record all the data essentially that is
the the idea behind it and there's some
arguments online that I've read do we
need to split it because
I also in the past knew it that the
loaders can also do the the latency
recording themselves kind of
um huge discussion going on which I
don't want to get into but this is just
something that I found useful splitting
up the probe even from the lotus
yes yes please
yes
I don't I don't so the question was when
you have kubernetes and the living
system were the hive Auto scales the
application in the cloud I don't have a
good answer to that at the moment in
terms of what I try to do is what this
talk is
because uh frankly
what I've seen in the past is that you
know you just offload it to you know you
say kubernetes you don't know what
Hardware this runs on then some assist
admins take some weird number of this
should be have this amount of memory
this amount of CPU I wanted to find out
with the talk if you
stopped with the containerization how
much could your application actually
handle and that was you know uh the
focus of this talk how you do it with
the auto scaling and all the caveats and
all the edge cases
um I can't give you a good answer
tonight on that
but you have some experience I have some
experience like that but not good enough
to put it into the next talk at the
moment so I would have to basically
reorder thoughts again and get a couple
more opinions to make it really
presentable
yes please I've
far left corner
one or two instances to get our base
numbers and then based on those you
actually have the numbers to set up this
way like how fast we need to scale how
many parts in which case
why did it depend on the user behavior
that you are having
and then you can retest more than just
one of these hpas are up and see whether
it actually holds the code that you were
expecting on the website
we can maybe take the discussion of the
whole thing in the
yes yes
I'm sorry I'm sorry we have recorded an
event please ask questions in microphone
yes
um do you see this kind of load test uh
load load testing task in the
development team or should it rather be
in another team is this where do you see
that so the thing is the way I see it is
historically I can only talk about my my
projects I've done in the past
is that load testing is something that
comes in five minutes before launch
and um then someone clicks some buttons
on jmeter or whatever and then you know
uh that's it that's your load testing
and then yeah it kind of works in terms
of it's most often checking off some
some check boxes and
um doing it last minute
when you want to understand you know all
the flame graphs for example the
profiling data whatever you need
developers actually I mean who built the
system to understand what's going on
there so at some point sooner or later
you need developers involved
um and I would actually it depends again
on organization and how you handle it I
would love for it to have more of a
developer developer focus more early on
and finding I mean the whole topic
starts with when you build something
against a database you know turn off
turn on the SQL logs and find out how
many SPL queries you spawn how long they
take for example how many developers do
that sometimes they do sometimes they
don't and then go live you find out oops
we have too many SQL queries so yes
um
the answer is it would be better if
developers would take it more serious
and do it early on in the whole cycle if
it's some if that is somehow some would
allowed
yeah
yes please
let's assume you work with signed up
users and you have to authenticate the
passwords I had the experience that this
also takes a lot of CPU time and we have
some recommendation what is a good way
of doing this authentication maybe you
can't give them tokens because it's too
um yeah to
cumbersome for the users and have you
some recommendations for that
recommendation when you want to log in
massively login users into your load
test so to speak
um not off the top of my head but let me
again a couple minutes and let's talk in
in the break let's see what I can come
up with yes
any more questions
yes please
I guess in real systems very often
bottleneck is a database
and I guess in almost all e-commerce
systems they generate a normal a normal
normal number of requests to one request
from from end user it can be I have had
experience with more than five six
thousand requests per one HTTP request
SQL request and in this case it's not so
easy to to scale because database it's
it's really difficult to scale
and I guess we should we should worry
about scalability of of bottlenecks or
how to solve our problem our real
problem
in that case I don't have a good idea
how could it be done it is very good
question so what you what I forgot to
mention a talk you I mean is what you
described the search for the weakest
link in the system so to speak and
obviously it doesn't help if you Auto
scale your web apps your web application
servers into Oblivion if your database
is only able to handle whatever 500
requests per second and yes the thing is
you're absolutely right I don't have a
great answer again for you because I've
also been part in many projects where in
the end the database was overloaded you
had many many systems spawning way way
way too many queries because long in the
development process people never turn on
SQL logs they don't care about databases
and then you're suddenly left with as
you said thousands of requests suddenly
against the database which you can't
easily scale if I had the super superb
answer to how to solve that then I guess
every company in Munich and whatever
people be happy I can only I mean you're
totally right I mean it's it's a
difficult problem
okay thanks yeah
I mean what we do in the past I mean you
need to start digging I had this one uh
one company the login process spawned
for whatever stupid reasons 400 SQL
queries and when you have that you you
log in a user and suddenly you you get
400 requests per second against the
database it's a lot what do you do I
mean do people touch it try to refactor
it get it down to seven SQL queries is
that possible with a huge Legacy project
where you have all kinds of crazy
database structure events flying around
you never know who spawned SQL queries
where uh It's Tricky Tricky
yes please
just would like to share my experience
regarding this situation as the scale
queries perhaps two things which I would
do uh very often happens because we use
some orm system yeah so
um like hibernate and just blindly right
in the court and don't care how how it
works on the network so at least you
should switch on SQL logging and look
what happens beside sometimes it's
really better to use directly gwc or DBC
instead so sometimes it's effective
perhaps there's also a sign that they
use the wrong type of database we also
could analyze it perhaps document
oriented databases feed better for use
cases as relational databases as this
this three scenes which I would
recommend yeah
all right
if so you can still hit me up later on
in the in the break hopefully we still
have I don't know what time it is I
completely lost track of time
um I just want to say a fun thank you if
you can hit me up now always I'm always
talk talkative uh Twitter marcobilla
Youtube macro codes if you want to find
some interesting
episodes on how to build your own text
editor for example Java based stuff you
can have a look at that otherwise please
have a chat and I'll be happy to thank
you again for listening and thank you
[Applause]
thank you Marco 15 minutes break
Browse More Related Video
JMeter Performance Testing Tutorial 1 - What is JMeter and how to install JMeter on Windows 10
Facebook Ads Scaling Strategy (Stage 2) For Indian Dropshipping Β· Bizathon 10 Β· EP7
WebAssembly: A new development paradigm for the web
Telemetry Over Events: Developer-Friendly Instrumentation at American... Ace Ellett & Kylan Johnson
Senior Frontend Developer Interview Questions 2024
Things every developer absolutely, positively needs to know about database indexing - Kai Sassnowski
5.0 / 5 (0 votes)