Apache NiFi Anti-Patterns Part 4 - Scheduling
Summary
TLDRDieses Video skizziert die Bedeutung von Planung, Threading und gleichzeitigen Aufgaben in Apache NiFi. Mark Payne, der Sprecher, betont, dass viele Benutzer diese Konzepte nicht vollständig verstehen, was die Leistung beeinträchtigen kann. Er erklärt, wie man den Thread-Pool size setzt, um die Datenflussleistung zu optimieren, und zeigt, wie man Prozessoren konfiguriert, um Engpässe zu vermeiden. Er diskutiert auch die Auswirkungen von Hardware-Design und Flow-Planung auf die NiFi-Leistung und rät, Änderungen schrittweise vorzunehmen, um die bestmögliche Konfiguration zu erreichen.
Takeaways
- 🔧 Die Konfiguration des Thread Pools ist entscheidend für die Leistung von Apache NiFi und kann den Durchsatz von tausend auf hunderttausend Ereignisse pro Sekunde pro Knoten steigern.
- 🛠️ Der Thread Pool-Size sollte im Allgemeinen zwischen dem Zweifachen und dem Vierfachen der Anzahl der CPU-Kerne liegen, je nach Systemgröße.
- ⚠️ Eine zu niedrige Einstellung des Thread Pools kann zu einer Unterausnutzung der Ressourcen führen, während eine zu hohe Einstellung das System überlasten kann.
- 🔄 Es ist wichtig, die Anzahl der Threads im Thread Pool pro Knoten zu beachten, nicht für die gesamte Clustergröße.
- 📈 Um die Leistung zu verbessern, sollte man mit der Konfiguration von Prozessoren beginnen, anstatt sofort den Thread Pool-Size zu erhöhen.
- 🔧 Die Einstellung 'Anzahl der gleichzeitigen Aufgaben' für einen Prozessor kann die Verarbeitungskapazität erhöhen, wenn dieser eine Engpass ist.
- 🕒 Die 'Laufdauer' eines Prozessors kann auf Millisekunden angepasst werden, um die Verarbeitung kleinerer Flow-Dateien zu optimieren (Micro-Batching).
- 📊 Die CPU-Auslastung sollte sorgfältig beobachtet werden, um zu entscheiden, ob der Thread Pool-Size erhöht werden sollte.
- 🚫 Ein zu hoher Wert für die Anzahl der gleichzeitigen Aufgaben kann zu Konflikt und Leistungsproblemen führen, daher sollte sie normalerweise nicht über 12 gehen.
- 💾 Wenn die CPU nicht die Bottleneck ist, könnten Probleme wie Datenträger-I/O oder die Flow-Design die Leistung limitieren.
- 📝 Der Flow-Design ist entscheidend für die Leistung; Verwendung großer Flow-Dateien mit vielen Datensätzen anstatt einer Vielzahl kleinerer Dateien.
- 🛑 Wenn die Leistung nicht durch die Erhöhung der Anzahl der gleichzeitigen Aufgaben oder anderer Einstellungen verbessert wird, könnte dies auf einen Datenträger-Bottleneck hinweisen.
Q & A
Was ist das zentrale Thema des Videos?
-Das zentrale Thema des Videos ist das Verständnis von Planung, threading und gleichzeitigen Aufgaben in Apache NiFi, um die Leistung und Stabilität eines NiFi-Clusters zu optimieren.
Welche Rolle spielt die Größe des Thread Pools in Apache NiFi?
-Die Größe des Thread Pools ist entscheidend, da sie bestimmt, wie viele Prozessoren gleichzeitig ausgeführt werden können, was direkt die Leistung und die Auslastung der Ressourcen beeinflusst.
Was passiert, wenn die Größe des Thread Pools zu niedrig eingestellt ist?
-Wenn die Größe des Thread Pools zu niedrig ist, wird die Ausnutzung der Ressourcen unteroptimal, da nicht genügend Aufgaben gleichzeitig durchgeführt werden können.
Was ist das Problem, wenn die Größe des Thread Pools zu hoch eingestellt ist?
-Eine zu hohe Größe des Thread Pools kann dazu führen, dass das System überlastet wird und möglicherweise nicht in der Lage ist, alle Hintergrundaufgaben zu verarbeiten, was zu Instabilität führen kann.
Wie sollte man die Größe des Thread Pools einstellen?
-Man sollte die Größe des Thread Pools normalerweise zwischen dem Zweifachen und dem Vierfachen der Anzahl der CPU-Kerne des Systems einstellen.
Was ist der Unterschied zwischen dem Event-Driven und dem Timer-Driven Thread Pool in NiFi?
-Der Event-Driven Thread Pool wird in der Regel nicht verwendet und sollte wahrscheinlich auf einen Thread eingestellt werden, während der Timer-Driven Thread Pool für die Ausführung von Prozessoren verwendet wird und daher wichtiger ist.
Was ist der Zweck des 'Run Duration'-Parameters in einem NiFi-Prozessor?
-Der 'Run Duration'-Parameter bestimmt, wie lange ein Prozessor ausgeführt wird, bevor er angehalten wird, um andere Prozesse eine Chance zu geben. Dies kann dazu beitragen, die Leistung zu verbessern, indem kleinere Flow-Dateien in Batches verarbeitet werden.
Was zeigt eine rote Verbindung in einem NiFi-Flow?
-Eine rote Verbindung zeigt an, dass es einen Engpass gibt und dass Daten gespeichert werden, weil der Prozessor, an den die Verbindung führt, nicht in der Lage ist, die Datenrate zu verarbeiten.
Wie kann man feststellen, ob die CPU-Auslastung hoch genug ist, um die Größe des Thread Pools zu erhöhen?
-Man kann die CPU-Auslastung mit Tools wie 'top' in Linux, der Taskleiste in Windows oder dem Aktivitätsmonitor in OSX überprüfen. Eine Last von unter 100% der verfügbaren Kerne deutet darauf hin, dass eine Erhöhung der Thread Pool-Größe möglicherweise notwendig ist.
Was sind die möglichen Ursachen für einen Leistungsverlust, wenn die CPU-Auslastung nicht hoch genug ist und die Größe des Thread Pools bereits optimal eingestellt ist?
-Wenn die CPU-Auslastung nicht hoch genug ist und die Größe des Thread Pools optimal, kann der Datenträger eine Engstelle darstellen. Die Verwendung mehrerer Festplatten oder schnellerer Speichertechnologien wie SSDs oder NVMe-Drives kann helfen, die Leistung zu verbessern.
Welche Rolle spielt die Flow-Design in der Leistung von NiFi?
-Das Flow-Design ist entscheidend für die Leistung. Ein Flow, der eine große Anzahl kleiner Flow-Dateien verarbeitet, kann zu Problemen wie vieler Müllsammlung, Swaps und Sperren führen. Es ist besser, Flows so zu designen, dass sie größere Flow-Dateien mit vielen Datensätzen verwenden.
Outlines
🔧 Einstellungen für Thread-Pools in Apache NiFi
Dieser Absatz behandelt die Bedeutung der korrekten Konfiguration von Thread-Pools in Apache NiFi für die Leistung und Stabilität des Systems. Mark Payne, der Sprecher, betont, dass selbst erfahrene Nutzer diese Einstellungen oft falsch konfigurieren. Er erklärt, dass die Größe des Thread-Pools, welche über das Menü 'Controller Settings' eingestellt werden kann, einen großen Einfluss auf die Datenflussleistung hat. Er empfiehlt, die Thread-Pool-Größe zwischen dem Zweifachen und dem Vierfachen der CPU-Kerne einzustellen und betont, dass dies pro Knoten und nicht für den gesamten Cluster gilt. Zudem wird die Bedeutung des Timer-Driven Thread-Pools hervorgehoben, der entscheidet, wie viele Prozessoren gleichzeitig ausgeführt werden können.
🚀 Verbesserung der Datenflussleistung durch Parallelität
In diesem Absatz wird gezeigt, wie die Leistung eines Datenflusses durch die Anpassung der Anzahl der parallelen Aufgaben gesteigert werden kann. Der Sprecher demonstriert, dass das Hinzufügen von Threads nicht immer die gewünschte Verbesserung bringt, wenn der Prozessor nicht in der Lage ist, den Datenfluss zu verarbeiten. Stattdessen sollte die Konfiguration des Prozessors, insbesondere die Einstellung für die Anzahl der Concurrent Tasks und die Laufdauer (Run Duration), angepasst werden. Durch das Einstellen der Laufdauer auf 25 Millisekunden wird die Verarbeitung von vielen kleinen Flow-Dateien optimiert, was zu einer besseren Durchsatzrate und geringerer Latenz führt.
🚨 Vermeidung von Überlastung durch begrenzte Concurrent Tasks
Der dritte Absatz warnt davor, die Anzahl der Concurrent Tasks zu hoch einzustellen, da dies zu einer Ressourcenüberlastung und Leistungsproblemen führen kann. Es wird erklärt, dass zu viele Threads um Zugriff auf die Warteschlange kämpfen und dadurch die Leistung beeinträchtigen. Es wird empfohlen, mit einer kleinen Anzahl an Concurrent Tasks zu beginnen und diese nur langsam zu erhöhen, um die beste Leistung zu erzielen. Zudem wird die Verwendung von Systemdiagnose-Tools und die Überwachung der CPU-Auslastung zur Entscheidungsfindung für die Thread-Pool-Größe diskutiert.
⚠️ Auswirkungen von CPU-Auslastung auf Thread-Pool-Größe
In diesem Absatz wird erläutert, wie die CPU-Auslastung die Entscheidung über die Größe des Thread-Pools beeinflusst. Es wird betont, dass die CPU-Auslastung (Load Average) nicht den gleichen Wert wie die Anzahl der verfügbaren CPU-Kerne erreichen sollte, um eine Überlastung des Systems zu vermeiden. Es wird empfohlen, die Load Average auf etwa 70% der verfügbaren Kerne zu begrenzen. Außerdem wird gezeigt, wie die Systemdiagnose in der NiFi-Oberfläche verwendet werden kann, um die CPU-Auslastung zu überwachen und die Thread-Pool-Größe entsprechend anzupassen.
🛠️ Optimierung des Datenflussdesigns für bessere Leistung
Der letzte Absatz betont die Wichtigkeit des Datenflussdesigns für die Leistung von NiFi. Es wird erklärt, dass die Verwendung von großen Flow-Dateien und record-orientierten Prozessoren zu einer deutlich besseren Leistung führen kann als die Verwendung einer großen Anzahl kleiner Flow-Dateien. Der Sprecher warnt vor den Problemen, die mit vielen kleinen Flow-Dateien einhergehen, wie z.B. erhöhter Garbage Collection, Back Pressure und Swap-Aktivitäten. Er gibt Tipps, um diese Probleme zu vermeiden, und betont, dass eine sorgfältige Planung des Datenflussdesigns wichtiger ist als die Hardware-Skalierung.
Mindmap
Keywords
💡Scheduling
💡Thread Pool
💡Concurrent Tasks
💡Apache NiFi
💡Event-Driven Thread Pool
💡Timer-Driven Thread Pool
💡Processor
💡Run Duration
💡Back Pressure
💡System Stability
💡Performance Tuning
Highlights
Scheduling and threading are critical for Apache NiFi performance and stability.
Misconfiguration can lead to poor performance or even cluster instability.
Correct configuration can drastically increase event processing rates.
Mark Payne discusses Apache NiFi anti-patterns in a series.
Thread pool size is a key setting for tuning data flow in NiFi.
Special permissions may be required to change system-wide settings.
Two types of thread pools in NiFi: event-driven and timer-driven.
The timer-driven thread pool is crucial for processor concurrency.
Under or over-allocating threads can lead to underutilization or system overwhelm.
A recommended thread pool size is 2-4 times the number of CPU cores.
Thread pool settings are per node and not multiplied by the number of nodes in a cluster.
Processor configuration, not thread pool size, often addresses bottlenecks.
Adjusting the number of concurrent tasks can help with processor bottlenecks.
Run duration settings can improve throughput and decrease latency for small flow files.
Batch processing with non-zero run duration can prevent queue backlogs.
CPU load average should ideally be kept below the total number of cores to prevent system overwhelm.
Disk I/O can become a bottleneck if not properly managed with separate disks or SSDs.
Flow design is crucial for performance, favoring record-oriented processors over many small flow files.
Incremental adjustments to settings are recommended over drastic changes.
Transcripts
scheduling scheduling threading and
concurrent tasks that's what we're
talking about today
this is one of the most critical topics
for a nifi user to understand
but a lot of people don't fully grasp
all the concepts
even people who have been using nifi for
years
now get this wrong and your performance
will suffer
get it very wrong and even the stability
of your cluster can suffer
but if you nail this it can make the
difference between
processing a thousand events per second
and processing a hundred thousand
even a million events per second on each
node in your cluster
i'm mark payne and this is part four
of my series on apache nifi
anti-patterns
nifi offers quite a few different knobs
that you can turn
to really tune the settings of your data
flow
but one of the most important knobs is
the size of the thread pool
to configure this you can go to the
hamburger menu in the top right corner
and then go down to controller settings
now the settings in this screen affect
all of the users on the system and all
the different flows running in the
system
so they require some special permissions
so if you're running
in a secure productionized environment
you may need help from
an administrator in order to configure
these settings
so on this screen you're really given
two different
uh thread pulls that you can configure
the event driven thread pull
and the timer driven thread pull the
event driven thread pull
is pretty much not even used at this
point so
we're not really going to cover that
aside from to say you should probably
leave it at
one thread it's probably going to be
removed removed
in a future version but the timer driven
thread pull now that is really
really important this is basically
telling nifi
how many different processors do you
want nifi to run
at any given time so if you set this
value
way too low you end up in a situation
where you're really under utilizing
your resources because you're not
scheduling enough things to happen
concurrently
but on the other hand if you set this
value far
too high you end up in a situation where
you're trying to do
too many different things at once and
you really kind of
overwhelm the system even to the point
that
it may not be able to handle all the
different background tasks
that it needs to complete and that can
even lead to
system instability
so that begs the obvious question then
what value should we set it to and so
in general we typically would recommend
that you set that
that thread pull to somewhere between
the range of two times the number of cpu
cores that you have on the system
and four times the number of cores you
have
uh so on a large system though that can
be quite a large range so if you've got
64 cores we're talking about anywhere
from 128 to 256 threads
so there's a pretty large range in there
where we might want to kind of narrow
down what's best for the system
and so we'll walk through that uh some
over the next
uh few minutes but it's important to
note also that
this is a per node setting
meaning that if you have a 10 node
cluster each with
64 cores you want to set that to
somewhere between 128 and 256
not 10 times that number
so to really understand what makes the
most system
the most sense for your particular
system i typically would say you want to
start
with two times the number of cores that
you have on your system
and then increase the size of the thread
pull gradually
as necessary and so let's
talk about when we need to actually
increase it
so i've got a simple data flow here i
generate some random data
i compress that data and then i've just
destroyed the data
so it's really simple straightforward
flow and if i start this flow
we're going to see pretty much right
away the compressed content is our
bottleneck now we see this because
this connections turn red we can see
that a lot of data's queued up
and we can see the output of the
compressed content processor doesn't
have
any backlog going on so there's nothing
in the flow that's
causing back pressure to prevent the
processor from running but it's clearly
not able to keep
up with the data rate
and so what we'll sometimes see is that
a user will look at this and they'll say
okay well i need to give it more threads
then
that'll give me better throughput so
they'll go in and start adjusting the
size of their
thread pull but the reality is that
that's really not going to help us
very much here because if we
look in this corner here we can see the
number of active threads in our data
flow is only
two and remember so if we come over here
to
our controller settings we can see that
our thread pull actually has
10 threads in it
so we can use up to 10 threads we're
only using
one or two threads at any given time so
increasing the size of that thread pull
is really not going to buy as much
so instead of adjusting the size of the
thread pull let's see what we can do
with the configuration of the compressed
content processor
so if we come in here to configure and
we go to the scheduling tab
we have this setting right here for the
number of concurrent tasks
so this is basically what is the maximum
number of
threads in that thread pool that this
processor is allowed to use
and the default for that is set to one
and that's what it's configured
to here so let's go ahead and set that
to two and see if that makes a
difference for us
so if we start this we'll give it a
little bit and see if that backlog
starts to work off or not
so we can see here that we've now got
two threads that are running
but we still got quite a backlog
and staying pretty steady here we're
back up to ten thousand
so at this point we could go ahead and
give it three threads so we can say use
three of the threads in that thread pool
and we can continue to increase the
number of threads
until we're able to make sure that this
processor is able to keep up with the
data flow
but if you notice we see that this
processor is also handling a lot of
really small flow files
so if we come in here and configure the
processor again
we have this setting over here called
run duration
now the default setting is zero
milliseconds and that gives us the
lowest latency
but in this case we know that we're
going to process
a lot of different flow files instead of
a small volume of flow files
so we can increase this to 25
milliseconds
and so for those of you who are familiar
with the concept
basically what we're configuring here is
micro batching
do we want to ensure that every flow
file is processed and transferred out of
this processor as soon as possible
or do we want to batch together some of
that processing
for some period of time before we
transfer it on
and so in this case i'm saying okay well
let's go ahead and
change the batch duration essentially to
25 milliseconds
and let's see if that helps so we'll
click apply
and start the processor
and just like that we can see that the
entire backlog has now been processed
and this processor has no problem coming
up with the throughput at this point
now if it were still a bottleneck we
would go ahead and give it three threads
and see if that helped or
maybe four threads and we can slowly
step that number up
but in this case using uh 25 millisecond
run duration
was all we really needed so that's going
to be a common pattern that we see
whenever you
start configuring processors that are
that are running over a very large
number of small flow files
we really want to use a run duration
that's going to be larger than
zero milliseconds whenever it's
available not all processors support it
for
different reasons based on the
underlying implementation
but it's also worth noting that now that
we have no backlog
we're actually not only increasing the
throughput but we're also
decreasing the latency significantly so
whenever we look at the slider
where the left hand side shows
the lower latency and the right hand
side
shows higher throughput this is kind of
assuming that the processor is able to
keep up with
the rate of data that's coming into it
if it's not able to keep
up then adding this batch duration will
actually lower your latency as well
because it prevents the data from
sitting in the queue
for a long period of time waiting to be
processed
now if that processor weren't able to
keep up with
two concurrent tasks of course i could
set it to three or four but it's
important
it's very important that we don't go
overboard with the number of
current tasks either in fact it's really
rare that you
want to go above say 12 concurrent tasks
because once we start using a lot of
concurrent tasks
what we end up seeing is that all those
different
threads now are basically vying for
right access to this queue so if you
have a lot of different threads
trying to constantly pull from the queue
at the same time that this processor is
using his threads to write to the queue
you kind of get into a situation where
you have a lot of uh of law
contention just like if you were to have
a door
you suddenly tried to squeeze 200 people
through that door
nobody's going to actually be able to
get through and the same thing is going
to happen
if we come in here and schedule this
processor and say okay i want to use
concurrent tasks of 200
200 concurrent tasks for this processor
but this is what i see happen all the
time
i'll look at a user's flow and they'll
end up with
one processor having 100 concurrent
tasks the next processor having 200
current tasks the next process are
having 200 concurrent tasks
and we get to this point where the
processors really are performing really
really poorly
because they're just vying over
uh just a few nanoseconds of the cpu
time to actually run
their tasks so we really want to make
sure that
in order to get the best performance
we're really kind of starting with
a small value of typically one
concurrent task
and then we can increase that to two or
three as we need
and we'll go ahead and set that back
down to two concurrent tasks
and again we'll see that it has
absolutely no problem keeping up with
two concurrent tasks
now we've seen a pretty huge performance
improvement without
changing the size of the thread pull at
all but what if we were using all the
threads in our thread pull
what if we kept seeing that we had 10
active threads here then should we
increase the size of the thread pool
well it depends
if the cpu is already over utilized
adding threads isn't going to help us
in fact it will probably hurt us so we
really kind of need to check out
the utilization or the cpu load
now there are different tools that we
can use to do this in
linux you could look at the top command
in windows there's a task manager
in osx there's an activity monitor but
the nifi ui
actually provides this information to us
as well
so if we come over to the hamburger menu
in a clustered environment you can go to
the cluster
dialog and that will give you
information about
each node in the cluster in this case
i'm not using a cluster so i'll go to
the summary table and come down here to
system diagnostics
then i can come over here to the system
tab
now we can see here that we have 12
cores available to us
and on this side we see that the core
load average is 6.13
that's the one minute load average
so basically what this is telling us is
that over the last minute
on average we asked the cpu to do
six things at a time and we could have
asked it to do
up to 12 things
so we can in fact increase the size of
our thread pull
because we do actually have more
cpu cycles available to us but we want
to be careful here
we only want to increase the size
by a small amount let's say 20 or 30
percent
we don't want to say well we've got a
core load average of 6 and we can go up
to 12 so let's double it
we want to make sure that we're
increasing the size of the thread pull
slowly
because when if we give it more threads
it may be that the processors that will
use those
are actually much more cpu intensive
than the ones that happen to be used
over the last
minute predominantly so we want to make
sure that we're not
immediately jumping to a huge uh
increase in the size of that thread pole
but we do it
in a little bit more of a controlled
manner
and it's also important to note that we
typically don't want to actually have
our one-minute load average reaching
the same value as the number of
available cores
now if the one-minute load average is
equal to the number of cores that we
have
that means that we're basically asking
the cpu to do
exactly the the amount that it's able to
handle
which might sound like a good thing but
that means
that the cpu is doing absolutely as much
as it possibly can
so if we have a cluster of say 10 nodes
and we lose one node in that cluster
that means the other nine
are now going to have more data that
they have to process
which means they're probably going to be
using more cpu cycles
so we want to try to avoid getting to
the point
that losing a node in the cluster would
overwhelm
all of the other nodes in our cluster
so to do that we want to keep the load
average
to some value less than the number of
available cores typically i would say
you want the one minute load average to
be somewhere around 70 percent
of the number of available cores now you
might say that uh
for your situation 50 of the available
course
is the most that you're willing to use
or you might say that you're willing to
go
80 or 90 of the number of cores that you
have available
really just kind of depending on your
tolerance for risk
in that situation
situations can arise and do sometimes
arise
in which we've set the size of the
thread pull effectively
and our cpu is far from being fully
utilized
and yet we have a processor that can't
keep up with the data rate
so we increase the number of concurrent
tasks
we increase the run duration and yet
nothing really changes
it uses the same number of uh
or the same amount of cpu and the
throughput
really stays the same even though we've
gone from six to maybe
eight or ten concurrent tasks
now when this happens a lot of users
are very quick to say okay well
10 concurrent tasks is clearly not
enough let's go ahead and use a 100
concurrent task or 200 concurrent tasks
and of course we have to make sure that
our thread pull can handle that so let's
set our thread pull to 2000 threads
but please don't do this we've already
talked about some of the problems that
this can cause
but i can guarantee you that you're not
actually going to get the results that
you want
so when we end up in a situation like
this
what this is really indicating is that
our bottleneck is not the cpu
oftentimes the bottleneck is actually
the disk
so if we have the content and the
provenance and the flow file repository
all writing to a single spinning disk
you can really reach a point
where that disk is the bottleneck pretty
quickly
so if you're using spinning disks i
would certainly recommend
try to use a separate disk for each of
the content profile and providence
repositories
ideally use more than one disk for the
content and provenance repositories
or even better yet use an ssd or even an
nvme drive
if you have really high volumes of data
and the nifi admin guide can walk you
through how to actually configure those
different repositories to use
separate disks and to use multiple disks
but the other thing that we need to
consider
is the design of the flow itself now
it's really important to think about the
design
as we're building out our flow so if we
are building a flow that's going to take
a huge number of really small flow files
we're going to get to a point where we
have a lot of garbage collection
occurring
uh if back pressure is not configured
perfectly we
we can certainly get into a case where
we have a lot of swapping
so we're writing those flow files to
disk and then
deserializing them and serializing them
over and over again
and that can really cause performance
problems
and we can actually even get to the
point that the
provenance repository starts to apply
back pressure
due to just too many provenance events
not being able to keep up
and so we could certainly add more
hardware we could add more
more nodes to our cluster
but the design of the flow is actually
far more important to the performance
than the hardware that we run on
we saw in part one of the series where
going from a flow that's really using a
large number of flow files
and taking that into a flow that uses
record-oriented processors
can often yield results that are at
least an order of magnitude better
performance
or better throughput and
that's not at all uncommon
we'll very often see at least an order
of magnitude better performance when we
design our flow
in a way that uses larger flow files
with many records
than whenever we use a lot of really
small flow files
and this is really key because for most
people it's
really hard to increase the amount of
hardware that they have but at an order
of magnitude
users who have a five node nifi cluster
typically don't have the resources to
scale out to 50 nodes
so if we are just careful whenever we're
designing
our flows we can really kind of avoid
a lot of the pitfalls that we run into
with garbage collection and back
pressure from the providence repository
and swapping
and really just a lot of the lock
contention
and a lot of problems that we inherently
will see
with really large numbers of small flow
files
as i said in the beginning nifi offers
quite a few different knobs that you can
turn
today we discuss some of the most
important knobs but please keep in mind
that
as you start to adjust these settings
start small
and and adjust them gradually if making
a small change helped but it wasn't
enough
try making another small change maybe
slightly bigger
going from one concurrent task to two
improve the performance but it's not
quite enough
try three or four or look at the run
schedule
we want to avoid going from one or two
concurrent tasks to ten or a hundred
it's rarely going to give us what we're
looking for
but time and time again i've seen users
running on
vms that have four or eight cores and
they set the size of the thread pool to
2500
now if your system can do four or eight
things at a time
i promise it's not going to behave the
way that you want when you ask it to do
2500 things at a time
so just start small make slow
small adjustments and you'll get to
where you need to be
thanks a lot for watching guys take care
関連動画をさらに表示
SI Vergaser #28:Teil 2 Motorabstimmung mit Lemarxon Bauteilen; Tuning Vespa Px, Rally Sprint, ...
Apache NiFi Anti-Patterns, Part 1
Turning Notion Into Your Automated CRM system
SCHAFFE 10X MEHR als ALLE anderen mit DIESEM TRICK
EDIT Videos FASTER In Any Software! (Free Downloads)
So erstellst du fotorealistische KI Bilder in Canva. Schritt für Schritt Anleitung.
5.0 / 5 (0 votes)