Why you don't need to worry about scaling your Java webapp

Marco Codes
26 Sept 202255:42

Summary

TLDRIn this insightful talk, the speaker addresses common concerns regarding the scaling of Java web applications. They delve into the intricacies of load testing, emphasizing the importance of understanding real user journeys and infrastructure. By guiding through the process of performance testing, the speaker shares valuable strategies for developers to assess system limits, optimize application performance, and manage expectations effectively, ultimately aiming to alleviate scaling worries for Java web applications.

Takeaways

  • 😀 The speaker reassures that often, concerns about scaling Java web applications may be unfounded and shares insights on load testing.
  • 🔍 The talk was inspired by common questions about application scalability, especially when launching new products or APIs.
  • 🐰 The speaker delves into the 'rabbit hole' of performance testing, emphasizing the importance of understanding how many users an application can handle.
  • 📈 The presentation covers the process of load testing, including the setup of loaders and probes to simulate user requests and measure response times.
  • 🛠️ The speaker introduces tools and techniques for collecting and analyzing data during load testing, such as CPU, memory, network usage, and profiling data.
  • 📊 The importance of visualizing data through histograms and flame graphs is highlighted to better understand the performance of the application under test.
  • 🔧 The speaker discusses the need for detailed analysis of application performance, including business logic execution times and garbage collection logs.
  • 🚀 The talk touches on the concept of expectation management, urging developers to set realistic performance targets based on user behavior and infrastructure capabilities.
  • 🌐 The speaker shares an example from Stack Exchange to provide perspective on realistic traffic expectations for web applications.
  • 💡 It is suggested that load testing should be an integral part of the development process, not just a pre-launch checklist item.
  • 🔑 The discussion includes the challenges of database scalability, which is often a bottleneck in web application performance.

Q & A

  • What is the main topic of the talk?

    -The main topic of the talk is about why you don't need to worry about scaling your Java web application and how to approach performance testing and load testing effectively.

  • Why did the speaker start the talk with a question about scaling?

    -The speaker started with a question about scaling to relate to the common concern developers have when building new products or APIs, which is whether the application will scale under high load.

  • What are the roles of loaders and probes in the context of load testing?

    -Loaders generate load against the server by sending multiple requests, while probes simulate user behavior by sending a lower frequency of requests and recording latencies to measure the performance of the server under load.

  • Why is it important to separate the roles of loaders and probes?

    -Separating the roles ensures that the load generation and the performance measurement are isolated, preventing the loaders from becoming a bottleneck and providing accurate latency data.

  • What does the speaker mean by 'going down the rabbit hole'?

    -Going down the rabbit hole refers to diving deep into a complex subject or problem, in this case, exploring the intricacies of performance testing and understanding the capacity of different server instances.

  • What is the significance of using flame graphs in load testing?

    -Flame graphs provide a visual representation of where the application spends its CPU time, helping to identify bottlenecks and areas of the code that require optimization.

  • Why is it recommended to run load tests for more than 30 seconds in real-life scenarios?

    -Running load tests for a longer duration helps to simulate real-world usage patterns and can reveal issues that may not be apparent during short test runs, such as memory leaks or performance degradation over time.

  • What is the purpose of collecting data from every participant in the load test?

    -Collecting data from every participant, including the server, loaders, and probes, provides a comprehensive view of the system's performance, helping to identify the root cause of any issues that arise during the test.

  • How does the speaker suggest managing expectations regarding the load capacity of a web application?

    -The speaker suggests using real-world data, such as the traffic statistics of well-known websites like Stack Exchange, to provide a realistic perspective on the expected load and to set achievable performance goals.

  • What is the importance of understanding real user journeys in the context of load testing?

    -Understanding real user journeys helps to create load tests that mimic actual user behavior, ensuring that the tests are relevant and that the performance metrics collected are meaningful for the application's intended use.

  • Why is it not always necessary to choose the most powerful server instances for a web application?

    -Optimizing application code and understanding its performance characteristics often reveals that less powerful, and thus less expensive, server instances can handle the expected load, making it unnecessary to over-provision resources.

  • How can developers benefit from being involved in load testing?

    -Developers can benefit by gaining insights into the performance of their code under load, identifying inefficiencies, and learning how to write code that scales better and is more efficient under high-load conditions.

  • What is the potential issue with relying on health checks for load testing?

    -Health checks may not accurately reflect the performance of the server under load, as they typically only verify if the server is up and responding, rather than measuring the response times and system behavior under stress.

  • Why is it challenging to scale a system with a database that is the bottleneck?

    -Databases can be difficult to scale due to their stateful nature and the complexity of managing data consistency and transactions, making it a common bottleneck in high-load scenarios.

  • What is the speaker's view on the role of developers in load testing?

    -The speaker believes that developers should be more involved in load testing, starting early in the development cycle, to better understand the performance implications of their code and to optimize it for scalability.

Outlines

00:00

🤔 Introduction to Scaling Java Web Applications

The speaker initiates the talk by addressing the common concern of scaling Java web applications. They discuss the origin of the talk and engage with the audience to identify those working on web applications. The speaker shares their past experiences with pre-launch scaling questions and introduces the concept of performance testing, mentioning microservices, Project Reactor, and reactive programming as related topics. The session aims to explore the 'rabbit hole' of performance testing and load testing, with a focus on learning how to determine the load capacity of different Amazon EC2 instances for a Java application.

05:01

🔬 Setting Up a Load Testing Environment

This paragraph outlines the process of setting up a load testing environment using Amazon EC2 instances, embedded Jetty servers, and business logic simulating a tax rate endpoint. The speaker introduces the concept of 'loaders' for generating requests and 'probes' for measuring latency, emphasizing the importance of isolating these components to avoid skewed performance data. The paragraph also describes the initial load testing setup, including the number of requests per second and the duration of the test, and mentions the use of custom code for orchestrating the test.

10:02

📈 Analyzing Load Test Results and Collecting Data

The speaker discusses the importance of analyzing load test results by collecting data from all participants, including the server, loaders, and probes. They highlight the need to gather operating system data such as CPU, memory, network, and I/O usage to get a comprehensive view of the system's performance under load. The paragraph also introduces the idea of taking detailed snapshots of system performance metrics during the test runs and the use of flame graphs and profiling data to understand the application's behavior during load testing.

15:04

📊 Interpreting Test Results with Visualizations

In this section, the speaker explains how to interpret load test results using HDR histograms for visualizing latencies and understanding the performance of the server under different load conditions. They discuss the significance of examining the histograms to identify any performance bottlenecks or unexpected behavior in the system. The speaker also mentions the importance of checking the CPU logs, memory logs, and network logs for all participants to ensure the reliability of the test results.

20:07

🔍 Deep Dive into Flame Graphs and Profiling Data

The speaker provides an in-depth look at flame graphs and profiling data, explaining how they can be used to identify where the application spends its CPU time and memory during execution. They discuss the importance of understanding the call graph and garbage collection logs to optimize the application's performance. The paragraph emphasizes the need for developers to get into the habit of analyzing these detailed performance metrics to find areas for optimization.

25:08

🚀 Moving Beyond Absolute Numbers in Load Testing

The speaker emphasizes the importance of moving beyond just looking at absolute numbers in load testing and focusing on real user journeys and infrastructure. They discuss the need for expectation management when dealing with business stakeholders who may have unrealistic expectations about the load capacity of the system. The paragraph also introduces the idea of using real-world data from websites like Stack Exchange to provide a realistic perspective on the scale of load that web applications typically handle.

30:09

🛠️ The Importance of Developer Involvement in Load Testing

In this paragraph, the speaker argues for the importance of involving developers in the load testing process early on in the development cycle. They discuss the need for developers to understand the system's performance implications and to take load testing seriously to avoid issues at launch. The speaker also touches on the challenges of load testing in a containerized environment like Kubernetes and the importance of having a clear understanding of the application's performance on different hardware configurations.

35:09

🗣️ Audience Q&A and Final Thoughts

The final part of the script includes a Q&A session where the audience asks questions about load testing, database scalability, and the role of developers in performance optimization. The speaker shares their experiences and thoughts on these topics, emphasizing the complexity of scaling databases and the need for developers to be proactive in performance testing. The session concludes with the speaker's contact information and an invitation for further discussion during the break.

Mindmap

Keywords

💡Scaling

Scaling refers to the ability of a system, service, or application to handle an increasing amount of work, often in terms of the number of users or transactions it can process. In the video, the speaker discusses the common concern of whether a Java web application can scale effectively, especially when anticipating high traffic or user load. The script mentions scaling in the context of handling a sudden increase in demand, such as becoming the 'next Amazon'.

💡Java Web Application

A Java Web Application is a software program that runs on a web server and is written in the Java programming language. These applications make use of Java's capabilities to create dynamic and interactive web content. The video's theme revolves around addressing concerns related to the scalability of such applications, emphasizing the importance of load testing and performance optimization.

💡Load Testing

Load testing is the process of putting demands on a system to evaluate its performance under stress. It helps determine a system's behavior under both expected and peak conditions. The script discusses a detailed approach to load testing, including how to measure and analyze the performance of a Java web application under various simulated loads.

💡Microservices

Microservices is an architectural style that structures an application as a collection of small, loosely coupled services. These services are typically modular and can be developed, deployed, and scaled independently. The speaker briefly mentions microservices as a potential solution for scaling applications but chooses to focus on the foundational aspects of load testing without delving into microservices architecture.

💡Project Reactor

Project Reactor is a library for building non-blocking applications on the JVM using a reactive programming model. It is part of the broader reactive programming paradigm that aims to handle asynchronous data streams efficiently. The script mentions Project Reactor as one of the technologies that could be discussed in relation to scaling and effective programming, but it is not the main focus of the talk.

💡Amazon EC2

Amazon EC2 (Elastic Compute Cloud) is a web service that provides resizable compute capacity in the cloud. It is a foundational AWS service that enables users to run applications on the Amazon Web Services cloud. The script uses Amazon EC2 as an example platform for deploying and testing the scalability of a Java web application by exploring different instance types.

💡Performance Testing

Performance testing is a type of testing that determines how a system performs under a specific workload. It is crucial for identifying the speed, stability, reliability, and scalability of an application. The video script delves into performance testing, particularly load testing, as a means to understand and ensure that a Java web application can handle expected and peak loads.

💡Jetty

Jetty is a lightweight, open-source Java web server and servlet container. It is often used for serving Java web applications. In the script, the speaker mentions Jetty in the context of discussing high-performance web server instances and the importance of understanding how to handle many users effectively.

💡Flame Graphs

Flame graphs are a visualization tool used to represent profiled software performance data. They provide a hierarchical view of where time is spent in an application, making it easier to identify performance bottlenecks. The script discusses flame graphs as a method for analyzing CPU usage and understanding the efficiency of a Java web application during load testing.

💡Garbage Collection

Garbage collection is the process of automatically freeing memory that is no longer being used by a program. In the context of Java, the garbage collector runs periodically to reclaim memory. The script mentions the importance of monitoring garbage collection logs to understand potential stalls or hiccups in the JVM during load testing, which can impact application performance.

💡User Journeys

User journeys refer to the paths that users take as they interact with an application or website. Understanding user journeys is essential for creating a good user experience and for load testing, as it helps to simulate realistic usage patterns. The script emphasizes the importance of considering real user journeys when planning and conducting load tests to ensure they reflect actual user behavior.

Highlights

Introduction to the topic of scaling Java web applications and the common concern of whether a new product will scale.

Engagement with the audience to identify who is working on web applications.

The dilemma of being asked about scaling capabilities just before a product launch and the common questions about handling large loads.

An overview of various AWS EC2 instance types and the challenge of determining which is suitable for a Java application.

The importance of performance and load testing to understand how many users an instance can handle.

Introduction to the concept of loaders and probes in the context of load testing.

The significance of not overloading loaders during performance testing and the role of probes in simulating user requests.

A step-by-step process for conducting load testing, starting with a simple server setup and gradually increasing complexity.

The use of custom code to orchestrate load testing across multiple machines.

The process of analyzing server performance under different load conditions by collecting data from the server, loaders, and probes.

The importance of collecting detailed data such as CPU usage, memory usage, network I/O, and business logic execution times.

The use of flame graphs for visualizing where the application spends its CPU time and identifying bottlenecks.

The concept of expectation management when dealing with business stakeholders and setting realistic load handling expectations.

A discussion on the practicality of load testing and the difference between load testing and limit testing.

The idea of not needing the most powerful instance for an application and the approach of scaling down in testing to find the optimal instance type.

The conclusion emphasizing the importance of understanding user journeys, real infrastructure, and proper load testing to ensure scalable Java web applications.

A Q&A session discussing practical aspects of load testing, such as handling unexpected loads, memory usage during testing, and the role of developers in load testing.

Transcripts

play00:01

thank you Katya hi folks good evening

play00:03

together uh what's up this I guess

play00:05

that's how how we say it

play00:09

um my talk tonight is why you don't need

play00:13

to worry about scaling your Java web

play00:15

application

play00:16

how did this talk come about and by the

play00:18

way who is working on a web application

play00:20

in one form or another

play00:24

right thankfully because I gave this

play00:26

talk to a couple of folks here Japanese

play00:28

they almost almost worked on text

play00:31

editors and whatever and they couldn't

play00:32

really relate to what is going on here

play00:34

in this talk so okay

play00:37

now I don't know about you but in the

play00:40

past I had this

play00:42

these situations you build a new product

play00:44

you build a new API shortly before

play00:46

launch a product person comes in and

play00:49

asks you hey Will We scale will this

play00:52

thing scale

play00:53

what happens if we're going to be the

play00:54

next Amazon tomorrow what happens if a

play00:57

hundred thousand people simultaneously

play00:58

call our rest API we'll be just you know

play01:01

burst Up in Flames

play01:03

and then you can Mumble something about

play01:05

microservices and project reactor and

play01:07

reactive programming and Yuri is

play01:09

actually going to talk a bit about that

play01:10

later on about effective programming

play01:12

but I wanted to go down the rabbit hole

play01:16

what is the rabbit hole well the thing

play01:18

is I thought let's not you know take the

play01:21

shortcuts and uh do something

play01:23

containerized where some assist admin

play01:25

says well my app only has 0.5 CPUs

play01:29

I wanted to have a look at the Amazon

play01:31

ec2 instance type web page

play01:35

and what you can see here is so I'm on

play01:37

the AWS page I can see on the left

play01:40

general purpose instances

play01:42

and I have from t4g T3 t3a up down till

play01:47

A1 instance types

play01:49

I can click compute optimized C7 G GN C4

play01:55

memory optimized you know I can click

play01:57

through all these categories and I'll

play02:00

see plenty of instances with different

play02:02

letters basically every letter in the

play02:04

alphabet

play02:06

and I thought well if I have a Java

play02:08

application and I put it on any instance

play02:12

how would I know how many users that

play02:14

instance basically could uh it could

play02:17

handle how would I know which instance

play02:18

type to get I mean do I need r5n or do I

play02:22

need z1d

play02:23

and that left me down the the rabbit

play02:27

hole of let's say performance testing

play02:29

and a big part of the talk is on load

play02:33

testing don't worry I know that most of

play02:35

you have been probably doing some load

play02:37

testing at one point or another meaning

play02:39

you click around in J media you click

play02:40

around and get link something like that

play02:42

I want to

play02:44

show you

play02:45

a process of how to approach

play02:48

the question well uh which load can my

play02:52

my instance handle actually the whole

play02:54

process and I want to teach you a couple

play02:56

of contents along the way that I learned

play02:58

from the chatty team actually because I

play03:01

got together with this talk with the

play03:02

with the jetty maintenance The Jetty web

play03:04

server I asked them because they seem to

play03:07

have a lot of knowledge around high

play03:09

performance Jetty instances about

play03:10

handling many users and I'd just like to

play03:13

share a couple of Concepts

play03:15

all right now that being said what you

play03:18

can now do is kind of enjoy my

play03:20

handwriting

play03:23

um because I rarely handwrite

play03:26

apart from this presentation and let's

play03:28

see what we're going to do now

play03:32

the the setting I already drew a couple

play03:35

of things what you can essentially see

play03:37

is what I want to do is I want to have a

play03:38

server at the moment and I'm going to

play03:40

make it very simple we're not going to

play03:41

start with microservices we're not going

play03:43

to start with a database we're not going

play03:44

to start with 20 000 different Services

play03:46

let's keep it simple at the beginning

play03:48

and we can make it more complex later on

play03:51

the server is just a machine on ec2

play03:55

the type doesn't matter for now you can

play03:57

do it with any type it runs an embedded

play03:59

JT because it's simple to set up and it

play04:02

has some business logic where you can

play04:04

see here the BL the business logic

play04:07

is

play04:08

remember we're working for a boring

play04:10

company calculating tax rates

play04:13

so it only has one endpoint at the

play04:15

moment a tax rate endpoint and the tax

play04:18

rate endpoint gives you back stuff like

play04:20

for Germany named e rates 19 something

play04:23

like that

play04:24

one end point it's a bit unrealistic at

play04:27

the beginning but we'll have to learn

play04:29

all the concepts first and then you know

play04:31

make it more complex later on

play04:34

what we also have is the concept of

play04:36

loaders well what are loaders

play04:39

essentially these loaders they shoot

play04:41

requests against the API against the tax

play04:44

rate endpoint

play04:45

and you could have one loader you could

play04:47

have many loaders I have four loaders

play04:50

here and why do we have four loaders or

play04:52

n loaders because you have to make sure

play04:55

the loaders themselves are not

play04:57

overloaded when they basically do

play04:59

performance testing against your server

play05:01

which can happen relatively soon

play05:03

actually with more than you know beefy

play05:05

servers that your loader is the problem

play05:06

not the server is the problem

play05:09

and then obviously you want to isolate

play05:11

you don't want to do what a lot of

play05:13

developers do on their own machine you

play05:15

know put up the server the loader run

play05:16

everything in one machine and then just

play05:18

have some funky numbers which are

play05:20

totally unreliable

play05:23

what I also have is a concept of a probe

play05:26

what's the difference between a probe

play05:27

and a loader

play05:28

the probe is essentially so the loader

play05:31

the only trouble loader has is

play05:33

generating load against our endpoint

play05:35

nothing else it doesn't record any

play05:37

latencies whatever the probe itself is

play05:40

one machine which just does like you

play05:43

know one two requests per second

play05:45

something like that simulating a user

play05:47

going against the server so the probe

play05:51

doesn't spend much effort generating

play05:52

load it just you know browses for

play05:54

example the website or hits the

play05:55

endpoints in this case and you get

play05:58

isolated results on the latencies that

play06:00

the uh the probe gets when it hits the

play06:03

server endpoints

play06:05

so what we're going to do is just as a

play06:08

big as an overview we have these loaders

play06:11

we have the server we have the probe all

play06:13

machines in Amazon ec2

play06:15

and the loaders are going to send later

play06:18

on you know many requests per second

play06:20

against our server the probe sends one

play06:23

uh one or two requests per second

play06:26

against the server

play06:28

zooming out a tiny bit the process will

play06:31

look like this let me just see

play06:34

if that works

play06:36

different color right

play06:38

we're gonna start let's say

play06:42

the loaders are gonna start with 250

play06:45

requests per second against the server

play06:48

that makes it uh 1000 requests so we can

play06:50

read that 1000 requests uh in total per

play06:53

second against the server and we're

play06:55

going to let our tests run for 30

play06:56

seconds in real life it should actually

play06:58

run for longer than 30 seconds but for

play07:00

now I want to do everything live I don't

play07:02

just want to bombard you with numbers

play07:03

ideally I want you to go home and just

play07:06

repeat you know what I did here just

play07:10

with some live coding essentially all

play07:12

right so that is Step number one and as

play07:15

you might have guessed I prepared a tiny

play07:17

project which is not a jmeter which is

play07:20

not Gatling it is a bit of custom code

play07:25

just scrolling down here which you can

play07:27

ignore for now so I'll scroll up again

play07:29

and what it essentially does is it

play07:31

orchestrates these six machines

play07:34

here at the top I have these 250

play07:37

requests per second that every loader

play07:39

shoots against my endpoint and I'm just

play07:41

going to run this class now because it

play07:44

takes some time obviously and I want to

play07:46

show you the the Real Results this this

play07:48

test run got later on

play07:51

and I have to do some switching back and

play07:53

forth during this talk between the IDE

play07:55

and my drawings

play07:57

all right so while this runs

play08:00

let's open it up again

play08:01

um what's going to happen is we are

play08:03

going to start with 250 requests per

play08:05

second then let's try 1000 requests per

play08:08

second for example uh times four so we

play08:10

were going to shoot 4 000 requests per

play08:12

second

play08:13

and at some point whoops that was not it

play08:17

let's see uh we're gonna do the same

play08:20

thing we're going to do a couple of test

play08:21

runs maybe with 5000 requests per second

play08:24

RPS per second

play08:27

so that we have 20K in total

play08:29

and then maybe with 10K requests per

play08:32

second

play08:34

times for 40K

play08:36

and what we want to do is we want to

play08:38

find out once our server breaks down

play08:41

when does our server have problems

play08:43

handling the load that is Step number

play08:45

one

play08:47

step number two is so far I've been

play08:49

bending you know a fair amount a bit a

play08:52

fair amount of you know pretty standard

play08:54

load testing

play08:56

we want to make it a bit more complex

play08:58

because just you know sending off

play09:01

requests against the server doesn't

play09:03

really tell us anything it just gives us

play09:05

some weird numbers we have to make sure

play09:07

these numbers are reliable what do I

play09:10

mean by that well

play09:12

we need to collect data from the server

play09:15

from the probe from the loader from

play09:17

every participant in the load test what

play09:19

kind of data

play09:21

let's collect some operating system data

play09:23

for example let's find out what is the

play09:25

CPU usage of my server of the loaders

play09:28

what is my memory usage so memory usage

play09:31

of all these participants what is my

play09:34

network adapter usage for example

play09:37

what is my i o usage so I want to get a

play09:41

I call that actually a big picture

play09:43

so the operating system is the big

play09:46

picture

play09:48

I want to find out when I increase the

play09:50

load against my server

play09:52

how does my operating system behave not

play09:54

just from a server from the loader from

play09:56

everyone I want to find out hey did my

play09:58

loader maybe have CPU problems

play10:00

generating the loads was the network

play10:02

problem with the server so is the

play10:04

problem actually that the network

play10:05

bandwidth is too little or do we have a

play10:08

CPU problem do you have a memory problem

play10:09

just a big picture of what the problem

play10:11

actually is once I start generating more

play10:13

load very important step number one

play10:17

step number two big picture data gives

play10:20

us only you know the general glimpse of

play10:23

what's happening in our load test which

play10:25

means we need to get the detail picture

play10:28

also

play10:29

detailed picture

play10:33

what is the detail picture

play10:35

and by the way let me just quickly have

play10:37

a look at my load test that hopefully

play10:39

finished yes it finished already so I'm

play10:42

gonna spawn off a new load test

play10:43

hopefully you have Dimension capacity

play10:45

I'm just going to go back I just

play10:46

executed the test with a thousand

play10:48

requests per second I'm going to do

play10:49

another one with 1250 requests per

play10:52

second per loader

play10:54

so we have these results in a second

play10:56

also

play10:57

while I keep babbling okay let's rerun

play11:00

this

play11:05

good that looks good

play11:07

going back here detail picture

play11:10

detail picture means I want to get

play11:14

for example I want to get the latencies

play11:17

or the execution time of my business

play11:19

logic so I want to find out

play11:22

over here in my server excluding the

play11:26

chatty web server everything else I just

play11:28

want to know how long does my business

play11:29

business logic take so for example how

play11:31

long do my SQL queries take how long do

play11:33

all my workflows take I want to get

play11:35

detailed numbers on that regarding when

play11:37

I do the load test

play11:38

then you know just numbers like that

play11:41

don't help me a lot I also need to kind

play11:44

of cross that them with flame graphs or

play11:48

profiling data

play11:49

so I want to find out where does my

play11:52

application spend its CPU time where

play11:54

does my application spend its memory

play11:56

once it executes certain you know uh

play11:58

call graph essentially I also want to

play12:02

get data on maybe the garbage collection

play12:04

logs

play12:05

garbage collection logs or um

play12:08

hiccups whenever the jvm Stalls because

play12:11

uh

play12:12

during your garbage collection Cycles

play12:14

not much is happening right and you want

play12:16

to have a good understanding of your

play12:17

garbage collection box for example

play12:20

picture detail picture you need to get

play12:23

that for every test run that you're

play12:24

basically executing

play12:26

the good story is this Java class I

play12:28

showed you early on what it actually

play12:30

does is it not only goes and sets up

play12:33

these you know six machines

play12:36

it runs the tests it also makes sure

play12:38

during every test run to make snapshots

play12:40

for every participant of the CPU memory

play12:44

Network i o and make sure to double

play12:48

check the the business execution time

play12:49

business logic execution time the flame

play12:52

graphs profiling data garbage collection

play12:54

logs for the server

play12:57

which means part number two of this talk

play12:59

is having a look at this data and trying

play13:02

to figure out what that data actually

play13:04

means and what we can do with that data

play13:07

let me go back

play13:09

um now when people are thinking dude

play13:11

this is all on your machine the data is

play13:15

handily available if I'm using AWS I

play13:17

could use cloud watch I could use

play13:18

Services I wanted to make the talk

play13:20

vendor independent so all the data

play13:23

you're going to see in a second we

play13:25

basically get it from Linux command line

play13:26

tools on an ec2 instance and yes there's

play13:29

many other locations where you can get

play13:30

the data from

play13:33

what my test does is and this is just

play13:36

because I built it like that I get a

play13:39

results folder

play13:40

in my Maven Target directory but it's

play13:43

just because I put it that way and by

play13:45

the way I just saw this load test also

play13:47

finished let me just quickly run another

play13:49

load test with 5 000 requests per second

play13:51

so now we're shooting 20 000 requests

play13:54

per second against my machine

play13:57

right what data do we get we get a

play14:00

folder which is called plot let me show

play14:03

you there's a plot HTML file inside let

play14:06

me open up the plot

play14:09

what you see here is an HDR histogram

play14:11

and then we can debate now later on I

play14:13

have had this discussion many times if

play14:15

this is indeed a histogram or histogram

play14:17

should look different

play14:19

it's an HDR histogram

play14:21

what you see here is well at the moment

play14:23

you see two lines a red line in the blue

play14:25

line the Blue Line corresponds with my

play14:28

thousandth requests per second that I

play14:30

shot against the server the red line

play14:32

with my 5000 requests that I shot

play14:34

against the server these lines are the

play14:37

latencies that the probe gets whenever

play14:39

you know it says Hey tax rate tax rate

play14:42

tax rate tax rate

play14:44

on the left side you see the latency for

play14:46

the probe in microseconds

play14:49

confusingly you have to divide it by a

play14:51

thousand and you get the millisecond so

play14:53

actually all these calls that the probe

play14:54

did against the server

play14:57

um they were like one millisecond 1.25

play14:58

milliseconds 1.5 milliseconds that you

play15:01

can see here on the left

play15:03

the right Axis is the percentile meaning

play15:06

when you go here like here 90 percentile

play15:09

tells you ninety percent of the requests

play15:11

were done faster than

play15:14

1294 microseconds or 1.2 milliseconds

play15:18

that is actually not that important for

play15:21

now the percentiles but you just get you

play15:23

know a nice little line for the test run

play15:25

and you immediately see hey do I have

play15:28

like these uh small latencies do I have

play15:31

some huge bumps inside were there

play15:32

problems it's just a visual confirmation

play15:35

of what happened during your test run at

play15:38

the moment which we just sent a thousand

play15:39

requests per second 5000 requests per

play15:41

second we don't expect there to be huge

play15:44

bumps whatever so these lines are pretty

play15:46

much flat at the moment

play15:48

all right step number one have a look at

play15:50

these histograms or generate such

play15:52

histograms have a look at them

play15:54

then

play15:56

data overloads and I can promise you

play15:58

once you make we make it through these

play16:00

folders here uh then uh you did your

play16:03

work for for tonight

play16:06

so what I did is for every test run we

play16:09

did we get a folder like this one at

play16:11

1822.

play16:13

you see subfolders for loader number one

play16:16

loader number two loader three load of

play16:17

four because we have four loaders

play16:19

you see a folder for the probe and you

play16:21

see a folder for the server

play16:24

let's have a look inside what each of

play16:26

these folders has contains

play16:30

by the way nope because the test run

play16:32

just let me just quickly double check

play16:35

one more test run sorry for the

play16:37

switching but I just want to get some

play16:39

data before we talk through this

play16:42

so now we're sending 30 000 requests per

play16:45

second against my ec2 machine

play16:47

something like this okay

play16:49

back to our loader we have as promised

play16:52

uh data for the CPU usage we have data

play16:56

for the memory usage we have data for

play16:57

Network usage and we need to make sure

play16:59

this is what is skipped always

play17:02

to check these files or the data or the

play17:04

visualizations for every participant of

play17:07

the load test it's quite boring but it's

play17:09

something that you have to do

play17:11

my CPU log by the way this is quite a

play17:13

beefy Lola because it has eight CPUs

play17:16

which you can see here

play17:18

um I just take a couple of so the way

play17:21

this works is

play17:22

the load test runs for 30 seconds every

play17:24

10 seconds I take a snapshot so you'll

play17:26

find three tables in here with the CPU

play17:29

usage data and to not overload you we're

play17:32

just going to have a look at the first

play17:33

row here the all row which which is the

play17:35

average row and when you go to the right

play17:38

you can see that on average the CPUs

play17:40

were idle let's Round Up 99 of the time

play17:44

so they just did work for one percent of

play17:46

the time which is expected because you

play17:49

know it's a pretty beefing machine and

play17:50

they generate 250 requests per second

play17:53

which is literally nothing for such a

play17:55

machine

play17:56

right you double check that make sure

play17:58

everything is good and you go on to the

play18:00

memory log

play18:01

memory log very uh very similar just you

play18:04

know snapshots here at the table every

play18:07

10 seconds you get a new snapshot we can

play18:09

see the loaders have 30 gigabytes of

play18:11

memory they use 450 megabytes of memory

play18:14

they still have a lot of free memory

play18:16

again the big picture doesn't tell us

play18:18

anything here but for this

play18:21

simple test nothing else was to be

play18:23

expected

play18:25

Network log it's my favorite log because

play18:29

it has so many columns and tables

play18:32

what you have to understand is that two

play18:34

tables now represent one snapshot and

play18:37

the only numbers you're basically

play18:40

interested in are these receive

play18:42

kilobytes per second and transmitted

play18:44

kilobytes per second

play18:45

and then you have to divide these

play18:47

numbers by 125 I think to come up with

play18:51

the megabits like 100 megabits or you

play18:54

know a gigabit whatever and again you

play18:57

know sending just a bit of Json around

play18:59

for this load test doesn't mean that our

play19:01

network adapter is the problem by the

play19:03

way what I learned in this when I

play19:05

prepared the talk so you can I mean

play19:08

sooner or later you will max out the

play19:10

network adapter actually that the

play19:11

network adapter is the problem nothing

play19:13

else is the problem for the server if

play19:15

you just get enough loaders you know

play19:16

generate some loads what I didn't know

play19:18

is that Amazon ec2 doesn't only cap

play19:24

the whole bandwidth like a 100 m bit or

play19:26

a gigabit they also have a limit on the

play19:28

number of TCP packages every second you

play19:31

can get and funnily enough you will hit

play19:34

that limit earlier most of the time than

play19:36

your bandwidth limit it's quite funky

play19:38

but just keep it in mind if you ever

play19:40

have such an instance where suddenly

play19:42

Network packages are being dropped

play19:44

all right

play19:45

we had a look at that at these operating

play19:48

system data while we have another test

play19:51

run that's completed let me just have a

play19:53

look at our plot I'm switching back to

play19:55

the plots let me see what that looks

play19:57

like

play19:58

all right the plot now has four lines it

play20:01

got updated because we had four test

play20:03

runs and we have a new let's the green

play20:07

line here it's the 30 000 requests per

play20:09

second and you can see that for 30 000

play20:11

requests per second our probe 99 of the

play20:15

time almost had requested to 1.5

play20:17

milliseconds so again nothing

play20:20

and then you have an odd request here

play20:23

that starts taking like three

play20:24

milliseconds for example so that could

play20:27

be or could tell us maybe the server you

play20:30

know has some some issues maybe not but

play20:32

keep in mind we're still talking one

play20:34

millisecond three milliseconds two

play20:36

milliseconds we would need to figure out

play20:37

what the actual is that a problem

play20:39

actually or is that just normal

play20:42

all right while this is here let me just

play20:46

shoot off another

play20:49

test run with 10 000 requests per second

play20:53

so now we have 40 000 requests per

play20:55

second let's see what happens in that

play20:57

case

play20:59

and while their test run is running

play21:01

um some more data so just to keep your

play21:03

mind have a look at CPU memory Network

play21:07

HTTP client status is log also something

play21:11

I mean it goes without saying the thing

play21:13

is when you hit rest apis you want to

play21:16

find out during a load test where all my

play21:18

calls successful did they have for

play21:20

example 200 this file is just a very

play21:22

simple visualization of the fact that my

play21:25

loader at second zero of the test or at

play21:29

second one of the test or second two of

play21:31

the test sent 250 requests and they all

play21:35

came back with the 200 status code so

play21:37

that is just another you know quick way

play21:39

of visualizing hey where all my requests

play21:41

sent by my HTTP client did they come

play21:43

back with an okay HTTP status code also

play21:46

something that can easily be skipped but

play21:48

sometimes skew the results because you

play21:50

think wow everything was super fast and

play21:52

then you find out well my calls actually

play21:53

all failed something like that

play21:57

um what you also want to do is to give

play22:00

you some even more data please ignore

play22:03

the upper part of the screen here

play22:05

um it's very nice with these low tests

play22:07

to have uh throughput lines throughput

play22:10

lines meaning you want to have pretty

play22:12

straight lines that describe that hey my

play22:15

loader actually sends 250 requests every

play22:19

second to the server in just a nice

play22:21

little straight line there weren't any

play22:23

big ups and big Downs which you'll see

play22:25

later on with the server when handling

play22:27

too many requests you just want to make

play22:29

sure visually at the beginning hey was

play22:31

everything a straight line was

play22:32

everything as I expected it to be

play22:36

okay so we have that and then we make it

play22:39

really short because we're not going to

play22:40

double check for the for the other

play22:42

loaders here we're also not going to

play22:44

check for the pro because the probe just

play22:46

sent one or two requests a second

play22:48

against the server let's have a look at

play22:50

this server

play22:51

the server um by the way let me just

play22:53

quickly have a look at if my new test

play22:55

run finished no it's running it's fine

play23:00

um the server has the same data

play23:01

operating system Network memory uh log

play23:04

which we can ignore for now it also has

play23:07

a nice little file profiler CPU flame

play23:11

graph let me open that up quick question

play23:14

Who has worked with flame graphs before

play23:17

who knows how to handle train graphs yes

play23:19

okay do you mind explaining how this

play23:21

works

play23:23

I'm just kidding

play23:24

um so to give you a very one-on-one

play23:27

overview of what you what you see here

play23:28

you see a couple of spikes you see

play23:31

different colors

play23:32

green the green color is our application

play23:34

code so our Java code our servlet for

play23:37

example that we wrote in our embedded

play23:38

chatty

play23:39

the red color the yellow color and the

play23:42

orange color is actually time in this is

play23:46

a CPU flame graph so we want to find out

play23:48

where was actually the time spent

play23:51

cpu-wise you know handling our HTTP

play23:54

requests

play23:55

and the red orange and yellow uh colors

play23:59

are native codes uh kernel codes and jvm

play24:03

code jvm code is actually C plus plus

play24:06

the yellow Code Orange is Kernel codes

play24:09

native codes operating system code is

play24:12

red and we can ignore that for now but

play24:14

you even see with these flame graphs

play24:15

where does my operating system spend

play24:17

spend this time where's time being spent

play24:19

in the jvm for example

play24:21

we want to have a look at the green code

play24:24

the green bars which is our application

play24:27

code

play24:29

the chatty server is essentially to make

play24:32

it really simple just a thread pool has

play24:34

many threads handling requests so most

play24:36

of the time our application spends its

play24:38

time in thread run you can see here at

play24:40

the very bottom hopefully that fret run

play24:43

takes up most of the CPU time here

play24:46

and then we need to find our servlet and

play24:49

Ira no because I've done this a couple

play24:50

of times that might you can search

play24:52

inside this flame graph but you can see

play24:55

here this tiny bar up here this should

play24:57

be my servlet that actually handles the

play24:59

request right so I can actually I can

play25:02

visually see here at this time that time

play25:05

spent in my own business logic is very

play25:08

very little compared to everything else

play25:09

that's going on

play25:11

and when I click the methods I can even

play25:13

see here inside my plane's JavaScript to

play25:16

get method where is time being spent in

play25:18

my plane servlets uh in my in my server

play25:21

methods and by the way I should have

play25:23

probably shown you the servlet because

play25:25

it's so simple

play25:27

just as a quick

play25:30

right so my sublet does nothing at the

play25:33

moment it has a to get method

play25:36

and it creates a bug with object a text

play25:39

rate object

play25:40

the textured object has always the name

play25:42

German v80 with a random number so every

play25:46

time you call it you just you know

play25:47

generate a random number and you get a

play25:49

random double back and then you write

play25:51

the thing directly to the HTTP server

play25:54

response so that's all it does generate

play25:56

an object a random number and write it

play25:58

to the the server response

play26:01

when you look at my flame graph you can

play26:03

see here the bar says well most of the

play26:05

time I mean the time being spent in the

play26:07

plane servlet do get method

play26:09

half of it roughly is generating the

play26:12

random number here that is see that the

play26:14

other bar you see Run next Double and

play26:16

the other half of it or 40 maybe is

play26:19

spent on writing actually to the HTTP

play26:21

servlet response now I know this is a

play26:25

very contrived simple artificial example

play26:27

but what you will need to get in into

play26:29

the Habit office when you have do these

play26:32

load tests execute these or generate

play26:35

rather these flame graphs for your CPU

play26:36

for your memory usage and you can dig

play26:38

down find out which methods of your

play26:40

program spend the most CPU time spent

play26:43

the most memory and then you can

play26:44

optimize accordingly if you want to do

play26:46

that

play26:47

that is just a quick one-on-one into uh

play26:50

flame graphs and how to handle them

play26:52

now let me just quickly go back let's

play26:55

see what my last load test did

play26:59

if it finished

play27:01

right it finished let me have a look at

play27:02

the plot again because I want to see

play27:04

what happened now

play27:06

if 40 000 requests are oh

play27:09

all right so what what happened is we

play27:13

send 40 000 requests a second against

play27:15

the server and now response times look a

play27:18

bit different and

play27:20

there's a couple of very important I

play27:23

mean notes to make here first of all we

play27:26

can when I scroll down here I can see

play27:27

that well 40 percent of the requests the

play27:30

40s percentile is still done in 1.5

play27:33

milliseconds and then when I can go up

play27:35

here and sometimes 95th percentile is

play27:39

120 milliseconds so suddenly the calls

play27:42

all go through they complete

play27:44

successfully but suddenly they take a

play27:46

120 times more than you know a regular a

play27:49

regular call

play27:51

what you can note however is you know

play27:53

just a second ago we thought about oh

play27:55

three milliseconds that is really a lot

play27:57

in in the visualization now you can see

play28:00

that these lines that we had earlier

play28:01

pretty much they're all the same thing

play28:03

it's just there was no difference

play28:04

essentially between one or two or three

play28:06

milliseconds but now with 40 000

play28:08

requests a second things look different

play28:10

and we suppose there might be a problem

play28:13

somewhere along the line and we'll have

play28:15

to double check with the data to find

play28:16

out if the data matches this screen here

play28:20

um to do that by the way let me just

play28:22

quickly double check what I forgot to

play28:23

show you Archer hiccup well the thing is

play28:26

what I talked about garbage collection

play28:28

logs J hiccup is

play28:30

a Java agent which allows you to find

play28:32

out when your jvm stalls and when there

play28:36

were pauses in in your jvm I'm gonna cut

play28:39

that out because analyzing garbage

play28:41

collection logs and the Stalls is also

play28:43

you know another huge topic but just to

play28:45

keep in mind this is also something to

play28:47

have a look for when did when when where

play28:49

my GC passes and and that sort of stuff

play28:53

um and then last but not least for the

play28:54

server also we generate

play28:57

um you know these visualizations right

play28:59

with the throughput line that you can

play29:01

see down here also

play29:04

this is our very first test run a nice

play29:06

little straight throughput line except

play29:08

for the end here when my tests stopped

play29:10

but that has nothing to do with uh with

play29:12

the handling requests

play29:13

all right so we want to have straight

play29:16

lines we want to have a look at all of

play29:17

this data let's find out what the data

play29:20

looks like for our last test run

play29:23

right and I'm going to make it let's see

play29:26

so this is hopefully 1835 this should be

play29:29

right now

play29:30

let's have a look at the server straight

play29:32

away

play29:33

the CPU log

play29:35

so when I scroll to the right here in my

play29:39

server CPU log what I can see is I did a

play29:41

couple of snapshots and at the beginning

play29:42

you know the very first snapshot uh it

play29:45

all looked still okayish kind of then

play29:48

you know my CPU is worth 50 of the time

play29:51

at the end here they worked 93 of the

play29:55

time so meaning that my CPU was

play29:57

basically almost at its maximum and that

play30:00

is not a good number 90 of you know CPU

play30:02

usage

play30:03

uh that corresponds with the picture but

play30:07

I also should find that out before

play30:08

actually having a look at the picture

play30:09

and just you know trying to uh not

play30:12

what's it called put the horse carriage

play30:15

before the horse or the other way around

play30:16

okay forget that

play30:19

memory wise what we can see is memory

play30:22

wise our server

play30:23

15 gigabytes memory in the snapshots 1.6

play30:26

gigabit 1.1 gigabit two gigabits

play30:29

actually uh gigabytes

play30:32

um what actually it's interesting that

play30:34

for every snapshot it increases by 500

play30:37

megabytes so that is also something to

play30:39

have a look at where does why does the

play30:41

memory usage increase so much

play30:43

Network let's have a look yes these look

play30:46

like real numbers now uh what you can

play30:49

see is uh suddenly receiving kilobytes

play30:52

per second and transmitter kilobytes per

play30:53

second so 20 000 divided by let's say

play30:56

100

play30:58

not quite right but let's say 20 M bits

play31:00

that we're sending through the network

play31:01

adapter which is still far away from a

play31:03

gigabit or 100mbit but certainly the

play31:06

traffic is is picking up

play31:10

the profiler right and then let's have a

play31:12

look at our throughput line because

play31:13

hopefully our our throughput line also

play31:16

looks different whoops that was the

play31:18

wrong button let's see

play31:20

right

play31:21

and that's what I meant before if your

play31:23

throughput lines suddenly for the server

play31:25

start looking like this well you don't

play31:28

have a straight line but suddenly you

play31:29

have something which looks completely

play31:31

random uncontrolled and then it goes

play31:34

down it goes up again and the server

play31:35

just handles any amount of requests it

play31:38

can do it also tells you you have a

play31:40

problem and the problem is

play31:43

what we've been doing so far and this is

play31:46

um one of the points I wanted to hint at

play31:49

we always want to find out when does our

play31:51

server break down

play31:52

and that is not load testing it's

play31:54

actually limit testing it's almost like

play31:57

you know you have someone who wants to

play31:58

jog and then you ask him hey how much

play32:01

can you jog and when do you just fall

play32:03

over and die

play32:04

and usually you don't want to have that

play32:06

someone dies what you want to do it

play32:08

someone jogs at his Pace kind of and

play32:11

then you know he just keeps jogging and

play32:12

then he can stop again and then the next

play32:14

day he can start walking again and

play32:15

running again and it just doesn't you

play32:17

know fall over and die

play32:18

and the big renovation here is that you

play32:22

don't want to find out uh Hey 50 000

play32:25

requests 100 000 requests ten thousand

play32:27

requests you want to find out what can

play32:29

my server handle with the expected load

play32:32

that my application is actually supposed

play32:34

to get not just some vague big numbers

play32:38

which means what we need to do is we

play32:40

need to complete our picture a tiny bit

play32:42

here

play32:45

let me see

play32:47

because so far we've been

play32:50

simplifying things

play32:54

one two three what are the problems with

play32:57

what we just did the first one I already

play32:59

mentioned we had a look at these

play33:01

absolute numbers

play33:02

so

play33:04

um we said hey 50 000 requests that is

play33:06

good

play33:07

the funny thing is in the real world

play33:08

what matters obviously are user Journeys

play33:11

you want to find out what do your users

play33:12

do on your website for example so if you

play33:14

have an e-commerce website you don't

play33:16

want just want to load test the password

play33:18

reset API you want to find out sure I

play33:21

mean my user brows is the catalog and

play33:23

then he buys something hopefully they

play33:25

buy a bit less than you know just

play33:27

browsing around and you need to make

play33:29

sure what are these real user stories

play33:31

and that is you know actually the real

play33:33

user the real user Journeys whoops let

play33:38

me just write it like so

play33:42

um most of the time you'll need to spend

play33:44

figuring this out if you have historic

play33:46

data I mean obviously that's great and

play33:48

you know that on Black Friday you get 10

play33:51

000 users coming in using a software if

play33:53

not you have to take a best estimated

play33:54

guess

play33:56

the second part is real user Journeys we

play33:59

also have to have real infrastructure

play34:01

meaning with real infra

play34:04

so far we tested against our server

play34:08

no database involved obviously you also

play34:10

need to get a database involved you want

play34:11

to do this against single machines you

play34:14

want to do it against your whole entire

play34:16

microservice landscape and I can tell

play34:19

you from experience that you know you

play34:21

just saw how Dreadful it kind of is to

play34:23

work through the data for one single

play34:25

machine and then doing it through an

play34:27

entire microservice landscape is quite

play34:29

another thing kind of but yes that's

play34:31

what you have to do I'm not going to

play34:33

show it here because the process is

play34:34

pretty much the same collect all the

play34:36

data for all the participants make sure

play34:38

the data makes sense the visualizations

play34:39

make sense and only then increase the

play34:42

load or change something about your

play34:43

system

play34:44

number three

play34:46

real user journey is real infrastructure

play34:49

I've been talking about absolute numbers

play34:51

and they don't really tell us anything

play34:54

there is however something I want to

play34:57

make these let's call it expectation

play35:00

management

play35:02

expectation management

play35:05

so your business people come in and

play35:07

obviously they say hey tomorrow we're

play35:08

going to be the next Amazon and we need

play35:10

to be able to handle a 10 000 requests

play35:12

or when you have some upload service and

play35:15

usually users upload two megabytes of

play35:17

files whatever they ask you hey what

play35:19

happens if suddenly every user uploads

play35:21

20 megabytes of files and that's because

play35:23

they always have this limit testing in

play35:25

the head which you already saw maybe we

play35:28

shouldn't do

play35:30

still expectation management I can tell

play35:33

you all day long Marco can tell you oh

play35:34

you don't need to worry about that and

play35:36

people want some hard numbers

play35:38

let me show you something

play35:40

um what I personally found interesting

play35:42

is

play35:43

the second stack exchange homepage

play35:46

because they published their performance

play35:47

numbers they have stack exchange is

play35:50

basically uh the company or the site

play35:52

hosting stack Overflow everything like

play35:54

that

play35:55

I think on the old Alexa page before it

play35:58

went down it was one of the top hundred

play36:00

most popular pages on the internet

play36:04

when you fall down here you can see that

play36:07

stack Exchange

play36:08

has to host all of that they have nine

play36:11

web servers maybe geographically

play36:13

distributed I don't know

play36:16

and they say that they usually handle

play36:18

handle like a 300 requests per second or

play36:21

maybe peak of 450 requests per second

play36:23

now I'm not saying that my servlet that

play36:26

you just saw handles all of stack

play36:29

Overflow what I want you to do is just

play36:31

relativize the numbers you saw earlier

play36:33

that if the biggest one of the biggest

play36:36

websites in the world has maybe a 300

play36:39

requests per second

play36:40

then no matter how cool your startup is

play36:44

or the company or your product people

play36:45

it's unlikely that you're going to

play36:47

launch the thing and you have 10 000

play36:49

requests per second like ten thousand

play36:52

ten thousand ten thousand ten thousand

play36:54

because then you're gonna be working for

play36:55

Google or some other or alipay like the

play36:58

Chinese students which have crazy crazy

play37:01

numbers but usually there's not that's

play37:04

not what's going to happen for you most

play37:06

of the time people I I would say could

play37:09

be happy with having tens of requests

play37:11

every second like consistently I know

play37:13

there are some iot projects whatever

play37:15

where you have crazy numbers and whatnot

play37:17

but still 10 15 50 100 requests per

play37:21

second is already a whole lot just to

play37:23

put things into perspective of what you

play37:25

saw earlier hence what I did with ah

play37:28

let's impress the people let's show them

play37:30

some random ec2 instance shoot off 50

play37:33

000 requests it's nice but it's not kind

play37:36

of realistic I mean what kind of project

play37:38

is that it to hitting one endpoint no

play37:41

user Journeys no real infrastructure 50

play37:44

000 requests is nice though it's it's a

play37:46

fair amount and it's much more than

play37:47

you'll ever need to handle with your

play37:49

application

play37:51

and what I want to do is just to

play37:53

summarize all of this I didn't give you

play37:56

a perfect the perfect instance type you

play37:58

could choose for your own project

play38:00

in fact I don't have a great number I

play38:02

did this scenario with many many

play38:04

different instances hitting the network

play38:06

packages limits for example as I told

play38:08

you earlier and my own web applications

play38:11

most of them they can always you know

play38:14

you can basically always choose the the

play38:16

cheapest option not the biggest or the

play38:19

most beefiest options you never you're

play38:20

never gonna need that because you always

play38:22

find out that well maybe your

play38:24

application code just have like 10

play38:25

second passes and whatnot you can always

play38:27

optimize and it doesn't make sense to

play38:29

always go for the the beefiest option it

play38:31

makes sense to run your tests and if you

play38:33

know just the 10 users are happy re-run

play38:36

a test against 20 different instances

play38:38

with the always cheaper option you know

play38:41

go cheaper and cheaper and cheaper with

play38:42

every test run

play38:44

um and then you can tell your boss by

play38:46

the way that he can just give you the

play38:48

difference from the ec2 bill and you can

play38:51

you can put it into your own pocket at

play38:53

the end of the month because he saved

play38:55

the whole company so much money

play38:57

if you do the load testing process and

play39:00

collect the data interpret the data and

play39:03

also do some exploitation management and

play39:04

push back a bit against product

play39:06

management which is hard at times then

play39:09

hopefully you won't need to worry about

play39:12

scaling your Java web application

play39:13

anymore in the future

play39:15

and that is I think all I wanted to show

play39:18

you today

play39:20

thank you

play39:26

I forgot any questions

play39:29

yes please

play39:32

the numbers

play39:48

but what happens

play39:50

yes my experience example

play39:56

mistake with the price

play39:58

zero

play40:00

few minutes or after 100 or 100

play40:05

of the question of how to test

play40:07

such that perhaps a short term some

play40:09

minutes or some hours but

play40:12

yes

play40:13

so did everyone hear the question kind

play40:15

of

play40:16

yes okay it's actually a good question

play40:18

so the thing is is there anything I

play40:21

don't want it there to be lost it makes

play40:23

sense with your Hardware to just once

play40:25

try it out what happens when 10 000

play40:27

people tomorrow come to my restaurant

play40:28

for example I only have placed for 10.

play40:31

then obviously you will have a problem

play40:33

one way or another are you absolutely

play40:36

right so it doesn't actually you can try

play40:38

these experiments also with some crazy

play40:40

numbers usually I would hope that it's

play40:43

not just misconfiguration but it might

play40:45

be Black Friday for example where you

play40:47

know that you have a hundred times the

play40:49

load throughout the year so it's kind of

play40:50

pre-planning uh pre-planning if it's

play40:54

really unexpected and really really

play40:57

um you have a hundred thousand users

play40:59

suddenly just then just good luck I

play41:02

can't help you with this talk uh yes

play41:08

yes please

play41:15

yes

play41:17

so the memory users of the server

play41:20

increased with the number of requests

play41:23

uh what's the reason because the server

play41:25

was not doing

play41:26

anything right just a random number

play41:28

generator yeah it's a good question I

play41:30

mean I guess some objects have been

play41:32

created I could just be that Jetty blows

play41:35

through all this memory with the request

play41:38

handling but I can't tell you you'd have

play41:40

actually have to figure it out yourself

play41:41

it's one of these the thing is the talk

play41:44

is really just the beginning of of a

play41:46

huge rabbit hole because for all of

play41:48

these things like for example where does

play41:50

the memory get wasted whether specific

play41:51

waste it you can spend hours days weeks

play41:55

essentially to figure out why that was

play41:56

the case so for that specific use case I

play41:58

don't know I would have to check it up

play42:00

and I can tell you that from experience

play42:02

with Jeff brains in our teams we have

play42:04

performance engineers and their only job

play42:06

is to figure out this stuff kind of and

play42:09

it's not something that just takes you

play42:11

two seconds but you'll actually have to

play42:12

do a lot of digging to figure out what

play42:14

is going on why is it going on uh that

play42:17

that's the real so this is just the

play42:19

beginning of the rabbit hole and maybe

play42:21

the answer to your question is is the

play42:24

next step into the rabbit hole

play42:27

yes

play42:32

yes please

play42:49

I didn't quite hear but so about

play42:51

childhood Communications and the other

play42:54

applications

play42:56

just the pure backhand

play42:59

with which don't have front and which

play43:02

communicates with other systems when you

play43:04

come check the performance it's the same

play43:07

approach or is there anything yes so so

play43:10

when I said web application for this

play43:12

talk actually uh I didn't just mean you

play43:14

know websites or whatever so it could

play43:16

just be system to system backends

play43:18

um I just meant that I gave as I said I

play43:21

gave this talk to some people here

play43:22

working mainly on desktop editors and

play43:25

then they said well that is nice but how

play43:27

does it you know correlate to what we

play43:29

kind of do here so I just wanted to say

play43:31

single user applications are not you

play43:33

know are not the the target for this

play43:35

talk but everything that you know runs

play43:37

somewhere web server could be system to

play43:39

system or back-end whatever it's the

play43:41

same approach across all these systems

play43:42

yeah same approach

play43:46

yes please

play43:47

yes

play43:57

I mean the thing is what you saw with

play43:58

the the probe essentially gives you the

play44:00

latencies we saw you is the whole round

play44:02

trip I mean for the probe uh to the

play44:04

server and back again and then just as

play44:06

additional data you'll ideally want to

play44:08

find out how long does just my business

play44:10

logic also take yeah how long does one

play44:12

workflow for example take it's just um

play44:14

so it's not either ours but it's

play44:16

actually both you want to have the full

play44:18

round trip latency and also a good

play44:20

picture of what the business logic does

play44:25

yes yeah yeah

play44:31

yes

play44:38

so we can say that probe is like health

play44:39

check and let's suppose at one moment uh

play44:43

the rest of the servers the load

play44:45

balancers they are taking in more memory

play44:47

and they are saying that actually

play44:50

more requests are coming but for the

play44:52

health check it's actually just to

play44:54

requests so it will always say that your

play44:56

server is actually fine

play44:58

so yesterday I mean hopefully it says

play45:01

the server is fine because the health

play45:02

check would imply you just check for a

play45:04

200 coming back what the probe does is

play45:07

the probe needs to tell you how do my

play45:09

latencies change the server so what you

play45:11

actually saw in the graphs is that the

play45:13

probe tells you hey suddenly my requests

play45:15

take 100 milliseconds 120 milliseconds

play45:17

so it's not just a health check but they

play45:19

actually have to do the recording of

play45:21

their request data so there's a

play45:23

difference yes that's that there is a

play45:25

difference yes so the probe don't so the

play45:28

loaders they just with the Lotus you

play45:30

have to make sure and this is by the way

play45:32

also something

play45:33

uh that you need to double check that

play45:36

you loaders consistently when you tell

play45:38

them send 5000 requests every second to

play45:40

the server that they really do that kind

play45:42

of yeah not just like one second four

play45:45

thousand five hundred and then the other

play45:46

five thousand but it must be five

play45:47

thousand five thousand five and forget

play45:48

kind of

play45:49

the probe you know just one request but

play45:52

record all the data essentially that is

play45:55

the the idea behind it and there's some

play45:57

arguments online that I've read do we

play46:00

need to split it because

play46:01

I also in the past knew it that the

play46:04

loaders can also do the the latency

play46:06

recording themselves kind of

play46:08

um huge discussion going on which I

play46:10

don't want to get into but this is just

play46:12

something that I found useful splitting

play46:14

up the probe even from the lotus

play46:19

yes yes please

play46:25

yes

play46:39

I don't I don't so the question was when

play46:42

you have kubernetes and the living

play46:44

system were the hive Auto scales the

play46:48

application in the cloud I don't have a

play46:51

good answer to that at the moment in

play46:52

terms of what I try to do is what this

play46:55

talk is

play46:56

because uh frankly

play46:58

what I've seen in the past is that you

play47:00

know you just offload it to you know you

play47:02

say kubernetes you don't know what

play47:04

Hardware this runs on then some assist

play47:06

admins take some weird number of this

play47:08

should be have this amount of memory

play47:10

this amount of CPU I wanted to find out

play47:12

with the talk if you

play47:15

stopped with the containerization how

play47:17

much could your application actually

play47:18

handle and that was you know uh the

play47:22

focus of this talk how you do it with

play47:24

the auto scaling and all the caveats and

play47:26

all the edge cases

play47:28

um I can't give you a good answer

play47:29

tonight on that

play47:32

but you have some experience I have some

play47:35

experience like that but not good enough

play47:37

to put it into the next talk at the

play47:39

moment so I would have to basically

play47:42

reorder thoughts again and get a couple

play47:44

more opinions to make it really

play47:45

presentable

play47:49

yes please I've

play47:51

far left corner

play48:02

one or two instances to get our base

play48:04

numbers and then based on those you

play48:07

actually have the numbers to set up this

play48:09

way like how fast we need to scale how

play48:11

many parts in which case

play48:15

why did it depend on the user behavior

play48:17

that you are having

play48:19

and then you can retest more than just

play48:22

one of these hpas are up and see whether

play48:24

it actually holds the code that you were

play48:26

expecting on the website

play48:28

we can maybe take the discussion of the

play48:30

whole thing in the

play48:32

yes yes

play48:34

I'm sorry I'm sorry we have recorded an

play48:36

event please ask questions in microphone

play48:38

yes

play48:45

um do you see this kind of load test uh

play48:48

load load testing task in the

play48:50

development team or should it rather be

play48:52

in another team is this where do you see

play48:54

that so the thing is the way I see it is

play48:57

historically I can only talk about my my

play48:59

projects I've done in the past

play49:01

is that load testing is something that

play49:03

comes in five minutes before launch

play49:06

and um then someone clicks some buttons

play49:09

on jmeter or whatever and then you know

play49:11

uh that's it that's your load testing

play49:13

and then yeah it kind of works in terms

play49:15

of it's most often checking off some

play49:17

some check boxes and

play49:21

um doing it last minute

play49:24

when you want to understand you know all

play49:26

the flame graphs for example the

play49:27

profiling data whatever you need

play49:29

developers actually I mean who built the

play49:31

system to understand what's going on

play49:32

there so at some point sooner or later

play49:34

you need developers involved

play49:36

um and I would actually it depends again

play49:38

on organization and how you handle it I

play49:41

would love for it to have more of a

play49:44

developer developer focus more early on

play49:46

and finding I mean the whole topic

play49:48

starts with when you build something

play49:50

against a database you know turn off

play49:53

turn on the SQL logs and find out how

play49:56

many SPL queries you spawn how long they

play49:58

take for example how many developers do

play50:01

that sometimes they do sometimes they

play50:03

don't and then go live you find out oops

play50:06

we have too many SQL queries so yes

play50:09

um

play50:10

the answer is it would be better if

play50:12

developers would take it more serious

play50:13

and do it early on in the whole cycle if

play50:16

it's some if that is somehow some would

play50:18

allowed

play50:19

yeah

play50:20

yes please

play50:23

let's assume you work with signed up

play50:25

users and you have to authenticate the

play50:27

passwords I had the experience that this

play50:30

also takes a lot of CPU time and we have

play50:33

some recommendation what is a good way

play50:35

of doing this authentication maybe you

play50:39

can't give them tokens because it's too

play50:42

um yeah to

play50:43

cumbersome for the users and have you

play50:46

some recommendations for that

play50:48

recommendation when you want to log in

play50:50

massively login users into your load

play50:53

test so to speak

play50:56

um not off the top of my head but let me

play50:58

again a couple minutes and let's talk in

play51:00

in the break let's see what I can come

play51:02

up with yes

play51:06

any more questions

play51:09

yes please

play51:12

I guess in real systems very often

play51:15

bottleneck is a database

play51:18

and I guess in almost all e-commerce

play51:21

systems they generate a normal a normal

play51:24

normal number of requests to one request

play51:29

from from end user it can be I have had

play51:32

experience with more than five six

play51:34

thousand requests per one HTTP request

play51:37

SQL request and in this case it's not so

play51:42

easy to to scale because database it's

play51:45

it's really difficult to scale

play51:48

and I guess we should we should worry

play51:51

about scalability of of bottlenecks or

play51:54

how to solve our problem our real

play51:58

problem

play51:58

in that case I don't have a good idea

play52:02

how could it be done it is very good

play52:06

question so what you what I forgot to

play52:08

mention a talk you I mean is what you

play52:10

described the search for the weakest

play52:11

link in the system so to speak and

play52:13

obviously it doesn't help if you Auto

play52:15

scale your web apps your web application

play52:17

servers into Oblivion if your database

play52:19

is only able to handle whatever 500

play52:21

requests per second and yes the thing is

play52:27

you're absolutely right I don't have a

play52:29

great answer again for you because I've

play52:31

also been part in many projects where in

play52:33

the end the database was overloaded you

play52:35

had many many systems spawning way way

play52:37

way too many queries because long in the

play52:39

development process people never turn on

play52:40

SQL logs they don't care about databases

play52:43

and then you're suddenly left with as

play52:45

you said thousands of requests suddenly

play52:47

against the database which you can't

play52:48

easily scale if I had the super superb

play52:51

answer to how to solve that then I guess

play52:53

every company in Munich and whatever

play52:55

people be happy I can only I mean you're

play52:58

totally right I mean it's it's a

play53:00

difficult problem

play53:02

okay thanks yeah

play53:06

I mean what we do in the past I mean you

play53:08

need to start digging I had this one uh

play53:10

one company the login process spawned

play53:13

for whatever stupid reasons 400 SQL

play53:15

queries and when you have that you you

play53:17

log in a user and suddenly you you get

play53:20

400 requests per second against the

play53:21

database it's a lot what do you do I

play53:24

mean do people touch it try to refactor

play53:25

it get it down to seven SQL queries is

play53:28

that possible with a huge Legacy project

play53:30

where you have all kinds of crazy

play53:32

database structure events flying around

play53:33

you never know who spawned SQL queries

play53:36

where uh It's Tricky Tricky

play53:40

yes please

play53:45

just would like to share my experience

play53:46

regarding this situation as the scale

play53:48

queries perhaps two things which I would

play53:50

do uh very often happens because we use

play53:55

some orm system yeah so

play53:57

um like hibernate and just blindly right

play54:00

in the court and don't care how how it

play54:03

works on the network so at least you

play54:05

should switch on SQL logging and look

play54:08

what happens beside sometimes it's

play54:10

really better to use directly gwc or DBC

play54:13

instead so sometimes it's effective

play54:16

perhaps there's also a sign that they

play54:18

use the wrong type of database we also

play54:21

could analyze it perhaps document

play54:22

oriented databases feed better for use

play54:25

cases as relational databases as this

play54:27

this three scenes which I would

play54:28

recommend yeah

play54:34

all right

play54:36

if so you can still hit me up later on

play54:38

in the in the break hopefully we still

play54:40

have I don't know what time it is I

play54:41

completely lost track of time

play54:44

um I just want to say a fun thank you if

play54:46

you can hit me up now always I'm always

play54:48

talk talkative uh Twitter marcobilla

play54:51

Youtube macro codes if you want to find

play54:53

some interesting

play54:55

episodes on how to build your own text

play54:57

editor for example Java based stuff you

play54:59

can have a look at that otherwise please

play55:01

have a chat and I'll be happy to thank

play55:04

you again for listening and thank you

play55:09

[Applause]

play55:10

thank you Marco 15 minutes break

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Java WebScalingLoad TestingPerformanceOptimizationEC2 InstancesUser JourneyServer ManagementDevOpsSoftware Engineering
هل تحتاج إلى تلخيص باللغة الإنجليزية؟