Apache NiFi Anti-Patterns Part 4 - Scheduling

NiFi Notes
24 Sept 202022:42

Summary

TLDRDieses Video skizziert die Bedeutung von Planung, Threading und gleichzeitigen Aufgaben in Apache NiFi. Mark Payne, der Sprecher, betont, dass viele Benutzer diese Konzepte nicht vollständig verstehen, was die Leistung beeinträchtigen kann. Er erklärt, wie man den Thread-Pool size setzt, um die Datenflussleistung zu optimieren, und zeigt, wie man Prozessoren konfiguriert, um Engpässe zu vermeiden. Er diskutiert auch die Auswirkungen von Hardware-Design und Flow-Planung auf die NiFi-Leistung und rät, Änderungen schrittweise vorzunehmen, um die bestmögliche Konfiguration zu erreichen.

Takeaways

  • 🔧 Die Konfiguration des Thread Pools ist entscheidend für die Leistung von Apache NiFi und kann den Durchsatz von tausend auf hunderttausend Ereignisse pro Sekunde pro Knoten steigern.
  • 🛠️ Der Thread Pool-Size sollte im Allgemeinen zwischen dem Zweifachen und dem Vierfachen der Anzahl der CPU-Kerne liegen, je nach Systemgröße.
  • ⚠️ Eine zu niedrige Einstellung des Thread Pools kann zu einer Unterausnutzung der Ressourcen führen, während eine zu hohe Einstellung das System überlasten kann.
  • 🔄 Es ist wichtig, die Anzahl der Threads im Thread Pool pro Knoten zu beachten, nicht für die gesamte Clustergröße.
  • 📈 Um die Leistung zu verbessern, sollte man mit der Konfiguration von Prozessoren beginnen, anstatt sofort den Thread Pool-Size zu erhöhen.
  • 🔧 Die Einstellung 'Anzahl der gleichzeitigen Aufgaben' für einen Prozessor kann die Verarbeitungskapazität erhöhen, wenn dieser eine Engpass ist.
  • 🕒 Die 'Laufdauer' eines Prozessors kann auf Millisekunden angepasst werden, um die Verarbeitung kleinerer Flow-Dateien zu optimieren (Micro-Batching).
  • 📊 Die CPU-Auslastung sollte sorgfältig beobachtet werden, um zu entscheiden, ob der Thread Pool-Size erhöht werden sollte.
  • 🚫 Ein zu hoher Wert für die Anzahl der gleichzeitigen Aufgaben kann zu Konflikt und Leistungsproblemen führen, daher sollte sie normalerweise nicht über 12 gehen.
  • 💾 Wenn die CPU nicht die Bottleneck ist, könnten Probleme wie Datenträger-I/O oder die Flow-Design die Leistung limitieren.
  • 📝 Der Flow-Design ist entscheidend für die Leistung; Verwendung großer Flow-Dateien mit vielen Datensätzen anstatt einer Vielzahl kleinerer Dateien.
  • 🛑 Wenn die Leistung nicht durch die Erhöhung der Anzahl der gleichzeitigen Aufgaben oder anderer Einstellungen verbessert wird, könnte dies auf einen Datenträger-Bottleneck hinweisen.

Q & A

  • Was ist das zentrale Thema des Videos?

    -Das zentrale Thema des Videos ist das Verständnis von Planung, threading und gleichzeitigen Aufgaben in Apache NiFi, um die Leistung und Stabilität eines NiFi-Clusters zu optimieren.

  • Welche Rolle spielt die Größe des Thread Pools in Apache NiFi?

    -Die Größe des Thread Pools ist entscheidend, da sie bestimmt, wie viele Prozessoren gleichzeitig ausgeführt werden können, was direkt die Leistung und die Auslastung der Ressourcen beeinflusst.

  • Was passiert, wenn die Größe des Thread Pools zu niedrig eingestellt ist?

    -Wenn die Größe des Thread Pools zu niedrig ist, wird die Ausnutzung der Ressourcen unteroptimal, da nicht genügend Aufgaben gleichzeitig durchgeführt werden können.

  • Was ist das Problem, wenn die Größe des Thread Pools zu hoch eingestellt ist?

    -Eine zu hohe Größe des Thread Pools kann dazu führen, dass das System überlastet wird und möglicherweise nicht in der Lage ist, alle Hintergrundaufgaben zu verarbeiten, was zu Instabilität führen kann.

  • Wie sollte man die Größe des Thread Pools einstellen?

    -Man sollte die Größe des Thread Pools normalerweise zwischen dem Zweifachen und dem Vierfachen der Anzahl der CPU-Kerne des Systems einstellen.

  • Was ist der Unterschied zwischen dem Event-Driven und dem Timer-Driven Thread Pool in NiFi?

    -Der Event-Driven Thread Pool wird in der Regel nicht verwendet und sollte wahrscheinlich auf einen Thread eingestellt werden, während der Timer-Driven Thread Pool für die Ausführung von Prozessoren verwendet wird und daher wichtiger ist.

  • Was ist der Zweck des 'Run Duration'-Parameters in einem NiFi-Prozessor?

    -Der 'Run Duration'-Parameter bestimmt, wie lange ein Prozessor ausgeführt wird, bevor er angehalten wird, um andere Prozesse eine Chance zu geben. Dies kann dazu beitragen, die Leistung zu verbessern, indem kleinere Flow-Dateien in Batches verarbeitet werden.

  • Was zeigt eine rote Verbindung in einem NiFi-Flow?

    -Eine rote Verbindung zeigt an, dass es einen Engpass gibt und dass Daten gespeichert werden, weil der Prozessor, an den die Verbindung führt, nicht in der Lage ist, die Datenrate zu verarbeiten.

  • Wie kann man feststellen, ob die CPU-Auslastung hoch genug ist, um die Größe des Thread Pools zu erhöhen?

    -Man kann die CPU-Auslastung mit Tools wie 'top' in Linux, der Taskleiste in Windows oder dem Aktivitätsmonitor in OSX überprüfen. Eine Last von unter 100% der verfügbaren Kerne deutet darauf hin, dass eine Erhöhung der Thread Pool-Größe möglicherweise notwendig ist.

  • Was sind die möglichen Ursachen für einen Leistungsverlust, wenn die CPU-Auslastung nicht hoch genug ist und die Größe des Thread Pools bereits optimal eingestellt ist?

    -Wenn die CPU-Auslastung nicht hoch genug ist und die Größe des Thread Pools optimal, kann der Datenträger eine Engstelle darstellen. Die Verwendung mehrerer Festplatten oder schnellerer Speichertechnologien wie SSDs oder NVMe-Drives kann helfen, die Leistung zu verbessern.

  • Welche Rolle spielt die Flow-Design in der Leistung von NiFi?

    -Das Flow-Design ist entscheidend für die Leistung. Ein Flow, der eine große Anzahl kleiner Flow-Dateien verarbeitet, kann zu Problemen wie vieler Müllsammlung, Swaps und Sperren führen. Es ist besser, Flows so zu designen, dass sie größere Flow-Dateien mit vielen Datensätzen verwenden.

Outlines

00:00

🔧 Einstellungen für Thread-Pools in Apache NiFi

Dieser Absatz behandelt die Bedeutung der korrekten Konfiguration von Thread-Pools in Apache NiFi für die Leistung und Stabilität des Systems. Mark Payne, der Sprecher, betont, dass selbst erfahrene Nutzer diese Einstellungen oft falsch konfigurieren. Er erklärt, dass die Größe des Thread-Pools, welche über das Menü 'Controller Settings' eingestellt werden kann, einen großen Einfluss auf die Datenflussleistung hat. Er empfiehlt, die Thread-Pool-Größe zwischen dem Zweifachen und dem Vierfachen der CPU-Kerne einzustellen und betont, dass dies pro Knoten und nicht für den gesamten Cluster gilt. Zudem wird die Bedeutung des Timer-Driven Thread-Pools hervorgehoben, der entscheidet, wie viele Prozessoren gleichzeitig ausgeführt werden können.

05:01

🚀 Verbesserung der Datenflussleistung durch Parallelität

In diesem Absatz wird gezeigt, wie die Leistung eines Datenflusses durch die Anpassung der Anzahl der parallelen Aufgaben gesteigert werden kann. Der Sprecher demonstriert, dass das Hinzufügen von Threads nicht immer die gewünschte Verbesserung bringt, wenn der Prozessor nicht in der Lage ist, den Datenfluss zu verarbeiten. Stattdessen sollte die Konfiguration des Prozessors, insbesondere die Einstellung für die Anzahl der Concurrent Tasks und die Laufdauer (Run Duration), angepasst werden. Durch das Einstellen der Laufdauer auf 25 Millisekunden wird die Verarbeitung von vielen kleinen Flow-Dateien optimiert, was zu einer besseren Durchsatzrate und geringerer Latenz führt.

10:02

🚨 Vermeidung von Überlastung durch begrenzte Concurrent Tasks

Der dritte Absatz warnt davor, die Anzahl der Concurrent Tasks zu hoch einzustellen, da dies zu einer Ressourcenüberlastung und Leistungsproblemen führen kann. Es wird erklärt, dass zu viele Threads um Zugriff auf die Warteschlange kämpfen und dadurch die Leistung beeinträchtigen. Es wird empfohlen, mit einer kleinen Anzahl an Concurrent Tasks zu beginnen und diese nur langsam zu erhöhen, um die beste Leistung zu erzielen. Zudem wird die Verwendung von Systemdiagnose-Tools und die Überwachung der CPU-Auslastung zur Entscheidungsfindung für die Thread-Pool-Größe diskutiert.

15:03

⚠️ Auswirkungen von CPU-Auslastung auf Thread-Pool-Größe

In diesem Absatz wird erläutert, wie die CPU-Auslastung die Entscheidung über die Größe des Thread-Pools beeinflusst. Es wird betont, dass die CPU-Auslastung (Load Average) nicht den gleichen Wert wie die Anzahl der verfügbaren CPU-Kerne erreichen sollte, um eine Überlastung des Systems zu vermeiden. Es wird empfohlen, die Load Average auf etwa 70% der verfügbaren Kerne zu begrenzen. Außerdem wird gezeigt, wie die Systemdiagnose in der NiFi-Oberfläche verwendet werden kann, um die CPU-Auslastung zu überwachen und die Thread-Pool-Größe entsprechend anzupassen.

20:05

🛠️ Optimierung des Datenflussdesigns für bessere Leistung

Der letzte Absatz betont die Wichtigkeit des Datenflussdesigns für die Leistung von NiFi. Es wird erklärt, dass die Verwendung von großen Flow-Dateien und record-orientierten Prozessoren zu einer deutlich besseren Leistung führen kann als die Verwendung einer großen Anzahl kleiner Flow-Dateien. Der Sprecher warnt vor den Problemen, die mit vielen kleinen Flow-Dateien einhergehen, wie z.B. erhöhter Garbage Collection, Back Pressure und Swap-Aktivitäten. Er gibt Tipps, um diese Probleme zu vermeiden, und betont, dass eine sorgfältige Planung des Datenflussdesigns wichtiger ist als die Hardware-Skalierung.

Mindmap

Keywords

💡Scheduling

Scheduling bezieht sich auf die Planung und Steuerung von Aufgaben oder Prozessen in einem bestimmten Zeitrahmen. Im Video ist es ein zentrales Thema, da es die Leistung und Stabilität des Nifi-Clusters beeinflusst. Beispielsweise wird erwähnt, dass die Größe des Thread Pools konfiguriert werden kann, um die Anzahl der gleichzeitig laufenden Prozessoren zu steuern.

💡Thread Pool

Ein Thread Pool ist eine Gruppe von Threads, die für die Verarbeitung von Aufgaben in einem System verwendet werden. Im Kontext des Videos ist die Größe des Thread Pools entscheidend, um die Verarbeitungskapazität und -leistung von Nifi zu optimieren. Eine zu kleine oder zu große Größe kann zu einer Unter- oder Überauslastung führen.

💡Concurrent Tasks

Concurrent Tasks beziehen sich auf die Fähigkeit, mehrere Aufgaben gleichzeitig auszuführen. Im Video wird dies als wichtiger Faktor für die Leistung von Nifi hervorgehoben, insbesondere in Bezug auf die Einstellung des Thread Pools und die Konfiguration von Prozessoren.

💡Apache NiFi

Apache NiFi ist eine Open-Source-Plattform für die Datenintegration und -verarbeitung. Das Video ist Teil einer Serie über NiFi-Anti-Pattern und erklärt, wie man die Leistung von Nifi steuert, indem man die Einstellungen für das Datenflussnetzwerk konfiguriert.

💡Event-Driven Thread Pool

Der Event-Driven Thread Pool ist eine Art von Thread Pool, der im Video als veraltet oder selten verwendet beschrieben wird. Es wird empfohlen, diesen auf einen Thread zu belassen, da er möglicherweise in zukünftigen Versionen entfernt wird.

💡Timer-Driven Thread Pool

Der Timer-Driven Thread Pool ist entscheidend für die Verarbeitung von Aufgaben in Nifi, da er Nifi darüber informiert, wie viele verschiedene Prozessoren gleichzeitig ausgeführt werden sollen. Eine falsche Konfiguration kann zu Leistungsproblemen oder Instabilität führen.

💡Processor

Ein Prozessor in Nifi ist eine Komponente, die Daten verarbeitet. Im Video wird gezeigt, wie die Konfiguration von Prozessoren, wie die Anzahl der Concurrent Tasks oder die Laufdauer, die Leistung beeinflussen kann.

💡Run Duration

Die Laufdauer ist eine Einstellung, die die Zeit angibt, die ein Prozessor für die Verarbeitung von Daten verwendet. Im Video wird erläutert, dass die Verlängerung der Laufdauer die Leistung und den Durchsatz erhöhen kann, indem sie das Micro-Batching ermöglicht.

💡Back Pressure

Back Pressure tritt auf, wenn ein System oder eine Komponente nicht in der Lage ist, Daten so schnell zu verarbeiten, wie sie eingehen. Im Video wird dies als ein Problem beschrieben, das durch die Anpassung der Prozessoren und des Thread Pools vermieden werden kann.

💡System Stability

Systemstabilität bezieht sich auf die Fähigkeit eines Systems, effizient und ohne Ausfälle zu arbeiten. Im Video wird betont, dass eine schlechte Konfiguration des Thread Pools oder die Überlastung von Prozessoren die Stabilität von Nifi beeinträchtigen kann.

💡Performance Tuning

Performance Tuning ist der Prozess des Optimierens der Leistung eines Systems durch die Anpassung verschiedener Parameter. Im Video wird erläutert, wie verschiedene Knöpfe in Nifi, wie die Thread Pool-Größe und die Laufdauer, eingestellt werden können, um die Leistung zu verbessern.

Highlights

Scheduling and threading are critical for Apache NiFi performance and stability.

Misconfiguration can lead to poor performance or even cluster instability.

Correct configuration can drastically increase event processing rates.

Mark Payne discusses Apache NiFi anti-patterns in a series.

Thread pool size is a key setting for tuning data flow in NiFi.

Special permissions may be required to change system-wide settings.

Two types of thread pools in NiFi: event-driven and timer-driven.

The timer-driven thread pool is crucial for processor concurrency.

Under or over-allocating threads can lead to underutilization or system overwhelm.

A recommended thread pool size is 2-4 times the number of CPU cores.

Thread pool settings are per node and not multiplied by the number of nodes in a cluster.

Processor configuration, not thread pool size, often addresses bottlenecks.

Adjusting the number of concurrent tasks can help with processor bottlenecks.

Run duration settings can improve throughput and decrease latency for small flow files.

Batch processing with non-zero run duration can prevent queue backlogs.

CPU load average should ideally be kept below the total number of cores to prevent system overwhelm.

Disk I/O can become a bottleneck if not properly managed with separate disks or SSDs.

Flow design is crucial for performance, favoring record-oriented processors over many small flow files.

Incremental adjustments to settings are recommended over drastic changes.

Transcripts

play00:00

scheduling scheduling threading and

play00:03

concurrent tasks that's what we're

play00:05

talking about today

play00:06

this is one of the most critical topics

play00:09

for a nifi user to understand

play00:11

but a lot of people don't fully grasp

play00:13

all the concepts

play00:15

even people who have been using nifi for

play00:17

years

play00:18

now get this wrong and your performance

play00:20

will suffer

play00:21

get it very wrong and even the stability

play00:24

of your cluster can suffer

play00:27

but if you nail this it can make the

play00:29

difference between

play00:30

processing a thousand events per second

play00:33

and processing a hundred thousand

play00:35

even a million events per second on each

play00:37

node in your cluster

play00:40

i'm mark payne and this is part four

play00:43

of my series on apache nifi

play00:45

anti-patterns

play00:55

nifi offers quite a few different knobs

play00:57

that you can turn

play00:58

to really tune the settings of your data

play01:00

flow

play01:01

but one of the most important knobs is

play01:03

the size of the thread pool

play01:05

to configure this you can go to the

play01:07

hamburger menu in the top right corner

play01:09

and then go down to controller settings

play01:19

now the settings in this screen affect

play01:22

all of the users on the system and all

play01:24

the different flows running in the

play01:25

system

play01:26

so they require some special permissions

play01:29

so if you're running

play01:30

in a secure productionized environment

play01:33

you may need help from

play01:34

an administrator in order to configure

play01:36

these settings

play01:38

so on this screen you're really given

play01:40

two different

play01:41

uh thread pulls that you can configure

play01:43

the event driven thread pull

play01:45

and the timer driven thread pull the

play01:47

event driven thread pull

play01:49

is pretty much not even used at this

play01:50

point so

play01:52

we're not really going to cover that

play01:53

aside from to say you should probably

play01:55

leave it at

play01:56

one thread it's probably going to be

play01:58

removed removed

play01:59

in a future version but the timer driven

play02:03

thread pull now that is really

play02:05

really important this is basically

play02:08

telling nifi

play02:09

how many different processors do you

play02:11

want nifi to run

play02:12

at any given time so if you set this

play02:15

value

play02:16

way too low you end up in a situation

play02:18

where you're really under utilizing

play02:21

your resources because you're not

play02:22

scheduling enough things to happen

play02:24

concurrently

play02:26

but on the other hand if you set this

play02:28

value far

play02:29

too high you end up in a situation where

play02:32

you're trying to do

play02:32

too many different things at once and

play02:34

you really kind of

play02:36

overwhelm the system even to the point

play02:38

that

play02:39

it may not be able to handle all the

play02:41

different background tasks

play02:43

that it needs to complete and that can

play02:45

even lead to

play02:47

system instability

play02:50

so that begs the obvious question then

play02:53

what value should we set it to and so

play02:58

in general we typically would recommend

play03:00

that you set that

play03:01

that thread pull to somewhere between

play03:03

the range of two times the number of cpu

play03:06

cores that you have on the system

play03:07

and four times the number of cores you

play03:09

have

play03:11

uh so on a large system though that can

play03:14

be quite a large range so if you've got

play03:16

64 cores we're talking about anywhere

play03:19

from 128 to 256 threads

play03:22

so there's a pretty large range in there

play03:26

where we might want to kind of narrow

play03:27

down what's best for the system

play03:30

and so we'll walk through that uh some

play03:32

over the next

play03:34

uh few minutes but it's important to

play03:37

note also that

play03:38

this is a per node setting

play03:41

meaning that if you have a 10 node

play03:43

cluster each with

play03:45

64 cores you want to set that to

play03:48

somewhere between 128 and 256

play03:52

not 10 times that number

play03:56

so to really understand what makes the

play03:58

most system

play03:59

the most sense for your particular

play04:01

system i typically would say you want to

play04:03

start

play04:04

with two times the number of cores that

play04:06

you have on your system

play04:08

and then increase the size of the thread

play04:10

pull gradually

play04:12

as necessary and so let's

play04:16

talk about when we need to actually

play04:17

increase it

play04:30

so i've got a simple data flow here i

play04:32

generate some random data

play04:35

i compress that data and then i've just

play04:38

destroyed the data

play04:39

so it's really simple straightforward

play04:41

flow and if i start this flow

play04:46

we're going to see pretty much right

play04:47

away the compressed content is our

play04:49

bottleneck now we see this because

play04:53

this connections turn red we can see

play04:55

that a lot of data's queued up

play04:56

and we can see the output of the

play04:58

compressed content processor doesn't

play05:00

have

play05:02

any backlog going on so there's nothing

play05:04

in the flow that's

play05:05

causing back pressure to prevent the

play05:08

processor from running but it's clearly

play05:10

not able to keep

play05:10

up with the data rate

play05:16

and so what we'll sometimes see is that

play05:18

a user will look at this and they'll say

play05:20

okay well i need to give it more threads

play05:22

then

play05:22

that'll give me better throughput so

play05:25

they'll go in and start adjusting the

play05:26

size of their

play05:27

thread pull but the reality is that

play05:31

that's really not going to help us

play05:33

very much here because if we

play05:36

look in this corner here we can see the

play05:39

number of active threads in our data

play05:41

flow is only

play05:42

two and remember so if we come over here

play05:45

to

play05:46

our controller settings we can see that

play05:49

our thread pull actually has

play05:50

10 threads in it

play05:54

so we can use up to 10 threads we're

play05:57

only using

play05:57

one or two threads at any given time so

play06:00

increasing the size of that thread pull

play06:01

is really not going to buy as much

play06:05

so instead of adjusting the size of the

play06:07

thread pull let's see what we can do

play06:09

with the configuration of the compressed

play06:11

content processor

play06:17

so if we come in here to configure and

play06:18

we go to the scheduling tab

play06:22

we have this setting right here for the

play06:24

number of concurrent tasks

play06:26

so this is basically what is the maximum

play06:28

number of

play06:29

threads in that thread pool that this

play06:31

processor is allowed to use

play06:34

and the default for that is set to one

play06:37

and that's what it's configured

play06:38

to here so let's go ahead and set that

play06:39

to two and see if that makes a

play06:41

difference for us

play06:45

so if we start this we'll give it a

play06:47

little bit and see if that backlog

play06:49

starts to work off or not

play06:54

so we can see here that we've now got

play06:56

two threads that are running

play06:58

but we still got quite a backlog

play07:04

and staying pretty steady here we're

play07:06

back up to ten thousand

play07:08

so at this point we could go ahead and

play07:12

give it three threads so we can say use

play07:14

three of the threads in that thread pool

play07:16

and we can continue to increase the

play07:18

number of threads

play07:19

until we're able to make sure that this

play07:22

processor is able to keep up with the

play07:23

data flow

play07:25

but if you notice we see that this

play07:27

processor is also handling a lot of

play07:29

really small flow files

play07:33

so if we come in here and configure the

play07:34

processor again

play07:37

we have this setting over here called

play07:39

run duration

play07:41

now the default setting is zero

play07:43

milliseconds and that gives us the

play07:44

lowest latency

play07:46

but in this case we know that we're

play07:49

going to process

play07:50

a lot of different flow files instead of

play07:52

a small volume of flow files

play07:54

so we can increase this to 25

play07:57

milliseconds

play07:59

and so for those of you who are familiar

play08:01

with the concept

play08:02

basically what we're configuring here is

play08:04

micro batching

play08:06

do we want to ensure that every flow

play08:09

file is processed and transferred out of

play08:11

this processor as soon as possible

play08:13

or do we want to batch together some of

play08:15

that processing

play08:16

for some period of time before we

play08:18

transfer it on

play08:20

and so in this case i'm saying okay well

play08:22

let's go ahead and

play08:23

change the batch duration essentially to

play08:26

25 milliseconds

play08:27

and let's see if that helps so we'll

play08:30

click apply

play08:31

and start the processor

play08:35

and just like that we can see that the

play08:37

entire backlog has now been processed

play08:40

and this processor has no problem coming

play08:42

up with the throughput at this point

play08:44

now if it were still a bottleneck we

play08:46

would go ahead and give it three threads

play08:48

and see if that helped or

play08:49

maybe four threads and we can slowly

play08:52

step that number up

play08:54

but in this case using uh 25 millisecond

play08:58

run duration

play08:59

was all we really needed so that's going

play09:01

to be a common pattern that we see

play09:03

whenever you

play09:04

start configuring processors that are

play09:07

that are running over a very large

play09:09

number of small flow files

play09:11

we really want to use a run duration

play09:13

that's going to be larger than

play09:15

zero milliseconds whenever it's

play09:17

available not all processors support it

play09:19

for

play09:20

different reasons based on the

play09:22

underlying implementation

play09:24

but it's also worth noting that now that

play09:27

we have no backlog

play09:28

we're actually not only increasing the

play09:31

throughput but we're also

play09:32

decreasing the latency significantly so

play09:35

whenever we look at the slider

play09:37

where the left hand side shows

play09:40

the lower latency and the right hand

play09:43

side

play09:44

shows higher throughput this is kind of

play09:47

assuming that the processor is able to

play09:49

keep up with

play09:50

the rate of data that's coming into it

play09:53

if it's not able to keep

play09:54

up then adding this batch duration will

play09:57

actually lower your latency as well

play09:58

because it prevents the data from

play10:00

sitting in the queue

play10:02

for a long period of time waiting to be

play10:04

processed

play10:06

now if that processor weren't able to

play10:08

keep up with

play10:09

two concurrent tasks of course i could

play10:10

set it to three or four but it's

play10:12

important

play10:13

it's very important that we don't go

play10:15

overboard with the number of

play10:17

current tasks either in fact it's really

play10:19

rare that you

play10:20

want to go above say 12 concurrent tasks

play10:23

because once we start using a lot of

play10:25

concurrent tasks

play10:27

what we end up seeing is that all those

play10:29

different

play10:30

threads now are basically vying for

play10:33

right access to this queue so if you

play10:36

have a lot of different threads

play10:38

trying to constantly pull from the queue

play10:41

at the same time that this processor is

play10:43

using his threads to write to the queue

play10:45

you kind of get into a situation where

play10:46

you have a lot of uh of law

play10:48

contention just like if you were to have

play10:51

a door

play10:52

you suddenly tried to squeeze 200 people

play10:54

through that door

play10:55

nobody's going to actually be able to

play10:57

get through and the same thing is going

play10:58

to happen

play10:59

if we come in here and schedule this

play11:02

processor and say okay i want to use

play11:07

concurrent tasks of 200

play11:11

200 concurrent tasks for this processor

play11:14

but this is what i see happen all the

play11:17

time

play11:18

i'll look at a user's flow and they'll

play11:20

end up with

play11:21

one processor having 100 concurrent

play11:23

tasks the next processor having 200

play11:25

current tasks the next process are

play11:27

having 200 concurrent tasks

play11:30

and we get to this point where the

play11:32

processors really are performing really

play11:34

really poorly

play11:36

because they're just vying over

play11:39

uh just a few nanoseconds of the cpu

play11:42

time to actually run

play11:44

their tasks so we really want to make

play11:46

sure that

play11:47

in order to get the best performance

play11:48

we're really kind of starting with

play11:50

a small value of typically one

play11:53

concurrent task

play11:55

and then we can increase that to two or

play11:57

three as we need

play12:00

and we'll go ahead and set that back

play12:04

down to two concurrent tasks

play12:11

and again we'll see that it has

play12:12

absolutely no problem keeping up with

play12:14

two concurrent tasks

play12:24

now we've seen a pretty huge performance

play12:25

improvement without

play12:27

changing the size of the thread pull at

play12:29

all but what if we were using all the

play12:31

threads in our thread pull

play12:34

what if we kept seeing that we had 10

play12:37

active threads here then should we

play12:40

increase the size of the thread pool

play12:43

well it depends

play12:46

if the cpu is already over utilized

play12:48

adding threads isn't going to help us

play12:50

in fact it will probably hurt us so we

play12:53

really kind of need to check out

play12:55

the utilization or the cpu load

play13:00

now there are different tools that we

play13:02

can use to do this in

play13:03

linux you could look at the top command

play13:06

in windows there's a task manager

play13:09

in osx there's an activity monitor but

play13:12

the nifi ui

play13:13

actually provides this information to us

play13:15

as well

play13:16

so if we come over to the hamburger menu

play13:19

in a clustered environment you can go to

play13:21

the cluster

play13:24

dialog and that will give you

play13:26

information about

play13:27

each node in the cluster in this case

play13:30

i'm not using a cluster so i'll go to

play13:31

the summary table and come down here to

play13:34

system diagnostics

play13:38

then i can come over here to the system

play13:39

tab

play13:42

now we can see here that we have 12

play13:45

cores available to us

play13:48

and on this side we see that the core

play13:51

load average is 6.13

play13:53

that's the one minute load average

play13:56

so basically what this is telling us is

play13:59

that over the last minute

play14:01

on average we asked the cpu to do

play14:04

six things at a time and we could have

play14:06

asked it to do

play14:07

up to 12 things

play14:10

so we can in fact increase the size of

play14:13

our thread pull

play14:14

because we do actually have more

play14:18

cpu cycles available to us but we want

play14:21

to be careful here

play14:23

we only want to increase the size

play14:26

by a small amount let's say 20 or 30

play14:29

percent

play14:30

we don't want to say well we've got a

play14:31

core load average of 6 and we can go up

play14:33

to 12 so let's double it

play14:35

we want to make sure that we're

play14:36

increasing the size of the thread pull

play14:38

slowly

play14:39

because when if we give it more threads

play14:41

it may be that the processors that will

play14:43

use those

play14:44

are actually much more cpu intensive

play14:46

than the ones that happen to be used

play14:47

over the last

play14:48

minute predominantly so we want to make

play14:51

sure that we're not

play14:52

immediately jumping to a huge uh

play14:54

increase in the size of that thread pole

play14:56

but we do it

play14:56

in a little bit more of a controlled

play14:58

manner

play15:00

and it's also important to note that we

play15:02

typically don't want to actually have

play15:04

our one-minute load average reaching

play15:07

the same value as the number of

play15:09

available cores

play15:11

now if the one-minute load average is

play15:13

equal to the number of cores that we

play15:15

have

play15:15

that means that we're basically asking

play15:19

the cpu to do

play15:20

exactly the the amount that it's able to

play15:22

handle

play15:23

which might sound like a good thing but

play15:25

that means

play15:26

that the cpu is doing absolutely as much

play15:29

as it possibly can

play15:31

so if we have a cluster of say 10 nodes

play15:35

and we lose one node in that cluster

play15:37

that means the other nine

play15:39

are now going to have more data that

play15:41

they have to process

play15:42

which means they're probably going to be

play15:44

using more cpu cycles

play15:46

so we want to try to avoid getting to

play15:49

the point

play15:50

that losing a node in the cluster would

play15:53

overwhelm

play15:54

all of the other nodes in our cluster

play15:57

so to do that we want to keep the load

play16:00

average

play16:00

to some value less than the number of

play16:03

available cores typically i would say

play16:05

you want the one minute load average to

play16:07

be somewhere around 70 percent

play16:09

of the number of available cores now you

play16:12

might say that uh

play16:13

for your situation 50 of the available

play16:16

course

play16:17

is the most that you're willing to use

play16:19

or you might say that you're willing to

play16:20

go

play16:21

80 or 90 of the number of cores that you

play16:24

have available

play16:26

really just kind of depending on your

play16:28

tolerance for risk

play16:30

in that situation

play16:39

situations can arise and do sometimes

play16:42

arise

play16:43

in which we've set the size of the

play16:46

thread pull effectively

play16:48

and our cpu is far from being fully

play16:50

utilized

play16:52

and yet we have a processor that can't

play16:54

keep up with the data rate

play16:57

so we increase the number of concurrent

play17:00

tasks

play17:01

we increase the run duration and yet

play17:03

nothing really changes

play17:05

it uses the same number of uh

play17:08

or the same amount of cpu and the

play17:11

throughput

play17:12

really stays the same even though we've

play17:14

gone from six to maybe

play17:15

eight or ten concurrent tasks

play17:19

now when this happens a lot of users

play17:22

are very quick to say okay well

play17:25

10 concurrent tasks is clearly not

play17:27

enough let's go ahead and use a 100

play17:29

concurrent task or 200 concurrent tasks

play17:32

and of course we have to make sure that

play17:34

our thread pull can handle that so let's

play17:35

set our thread pull to 2000 threads

play17:39

but please don't do this we've already

play17:42

talked about some of the problems that

play17:43

this can cause

play17:45

but i can guarantee you that you're not

play17:47

actually going to get the results that

play17:49

you want

play17:50

so when we end up in a situation like

play17:52

this

play17:54

what this is really indicating is that

play17:56

our bottleneck is not the cpu

play18:00

oftentimes the bottleneck is actually

play18:02

the disk

play18:03

so if we have the content and the

play18:06

provenance and the flow file repository

play18:08

all writing to a single spinning disk

play18:11

you can really reach a point

play18:13

where that disk is the bottleneck pretty

play18:15

quickly

play18:16

so if you're using spinning disks i

play18:19

would certainly recommend

play18:20

try to use a separate disk for each of

play18:23

the content profile and providence

play18:25

repositories

play18:27

ideally use more than one disk for the

play18:29

content and provenance repositories

play18:31

or even better yet use an ssd or even an

play18:34

nvme drive

play18:36

if you have really high volumes of data

play18:39

and the nifi admin guide can walk you

play18:42

through how to actually configure those

play18:44

different repositories to use

play18:46

separate disks and to use multiple disks

play18:49

but the other thing that we need to

play18:51

consider

play18:53

is the design of the flow itself now

play18:56

it's really important to think about the

play18:58

design

play18:59

as we're building out our flow so if we

play19:02

are building a flow that's going to take

play19:04

a huge number of really small flow files

play19:08

we're going to get to a point where we

play19:09

have a lot of garbage collection

play19:11

occurring

play19:12

uh if back pressure is not configured

play19:15

perfectly we

play19:16

we can certainly get into a case where

play19:17

we have a lot of swapping

play19:19

so we're writing those flow files to

play19:21

disk and then

play19:22

deserializing them and serializing them

play19:25

over and over again

play19:26

and that can really cause performance

play19:28

problems

play19:30

and we can actually even get to the

play19:32

point that the

play19:33

provenance repository starts to apply

play19:37

back pressure

play19:38

due to just too many provenance events

play19:41

not being able to keep up

play19:44

and so we could certainly add more

play19:48

hardware we could add more

play19:49

more nodes to our cluster

play19:52

but the design of the flow is actually

play19:56

far more important to the performance

play19:58

than the hardware that we run on

play20:01

we saw in part one of the series where

play20:05

going from a flow that's really using a

play20:09

large number of flow files

play20:10

and taking that into a flow that uses

play20:14

record-oriented processors

play20:16

can often yield results that are at

play20:19

least an order of magnitude better

play20:20

performance

play20:22

or better throughput and

play20:25

that's not at all uncommon

play20:28

we'll very often see at least an order

play20:31

of magnitude better performance when we

play20:33

design our flow

play20:35

in a way that uses larger flow files

play20:38

with many records

play20:39

than whenever we use a lot of really

play20:41

small flow files

play20:43

and this is really key because for most

play20:46

people it's

play20:47

really hard to increase the amount of

play20:49

hardware that they have but at an order

play20:50

of magnitude

play20:52

users who have a five node nifi cluster

play20:55

typically don't have the resources to

play20:56

scale out to 50 nodes

play20:59

so if we are just careful whenever we're

play21:01

designing

play21:02

our flows we can really kind of avoid

play21:06

a lot of the pitfalls that we run into

play21:09

with garbage collection and back

play21:12

pressure from the providence repository

play21:13

and swapping

play21:15

and really just a lot of the lock

play21:17

contention

play21:18

and a lot of problems that we inherently

play21:21

will see

play21:21

with really large numbers of small flow

play21:28

files

play21:34

as i said in the beginning nifi offers

play21:36

quite a few different knobs that you can

play21:37

turn

play21:38

today we discuss some of the most

play21:40

important knobs but please keep in mind

play21:42

that

play21:42

as you start to adjust these settings

play21:45

start small

play21:46

and and adjust them gradually if making

play21:49

a small change helped but it wasn't

play21:51

enough

play21:52

try making another small change maybe

play21:54

slightly bigger

play21:55

going from one concurrent task to two

play21:57

improve the performance but it's not

play21:59

quite enough

play22:00

try three or four or look at the run

play22:02

schedule

play22:04

we want to avoid going from one or two

play22:06

concurrent tasks to ten or a hundred

play22:09

it's rarely going to give us what we're

play22:11

looking for

play22:12

but time and time again i've seen users

play22:14

running on

play22:15

vms that have four or eight cores and

play22:18

they set the size of the thread pool to

play22:20

2500

play22:22

now if your system can do four or eight

play22:24

things at a time

play22:26

i promise it's not going to behave the

play22:27

way that you want when you ask it to do

play22:29

2500 things at a time

play22:32

so just start small make slow

play22:35

small adjustments and you'll get to

play22:37

where you need to be

play22:39

thanks a lot for watching guys take care

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
Apache NiFiDatenflussLeistungKonfigurationThread-PoolSystemlastProzessorKonkurrenzDatenrateClusterOptimierung
¿Necesitas un resumen en inglés?