Tuning OTel Collector Performance Through Profiling - Braydon Kains, Google

CNCF [Cloud Native Computing Foundation]
29 Jun 202414:52

Summary

TLDRIn this insightful talk, Braden K, a contributor to OpenTelemetry, discusses performance tuning of the OpenTelemetry Collector through profiling. He introduces profiling as a means to measure program activity at specific locations, akin to time series data points. Using 'pprof', a built-in Go tool, he demonstrates how to analyze and improve the Collector's performance, addressing issues like memory leaks and inefficient process metric collection. The talk showcases the utility of profiling tools for developers and users alike, emphasizing their accessibility and the potential for continuous profiling to enhance understanding and optimization of applications.

Takeaways

  • πŸ”§ The speaker, Braden K, discusses the use of profiling for performance tuning of the OpenTelemetry Collector and emphasizes the accessibility of profiling tools.
  • πŸ“ˆ Profiling is likened to taking measurements at different locations in a program over time, similar to how metrics are measured at different timestamps.
  • πŸ“Š Profiling formats such as PPR (which OpenTelemetry profiles are based on) support multiple types of signals and measurements, including CPU and memory profiling.
  • πŸ› οΈ The OpenTelemetry Collector is written in Go, which has built-in support for using PPR, simplifying the process of profiling.
  • πŸ”¬ Case studies are presented to demonstrate the use of profiling in identifying performance issues, such as a potential memory leak in the Prometheus receiver.
  • πŸ” The use of flame graphs in PPR provides a visual representation of memory allocation and can help pinpoint areas of a program consuming the most resources.
  • πŸ”„ The speaker identifies a potential cardinality leakage in the cumulative to Delta processor, suggesting a need for better cache eviction configuration.
  • βš™οΈ Process metrics collection on Windows is identified as inefficient due to the high cardinality and system call costs, leading to a quest for optimization.
  • πŸ’‘ The Windows Management Interface (WMI) is explored as a more efficient method for retrieving parent process IDs, reducing the workload of a single scrape.
  • πŸš€ The speaker shares ongoing work to improve the OpenTelemetry Collector's process scraping efficiency on Windows, with a PR pending merge.
  • 🌐 The talk concludes with an encouragement for OpenTelemetry developers and users to utilize profiling tools to understand and improve their collector's performance.

Q & A

  • What is the main topic of the talk given by Braden KES?

    -The main topic of the talk is about tuning the OpenTelemetry Collector's performance through profiling, specifically using the pprof built-in tool in Go to analyze performance problems.

  • What is the purpose of profiling in the context of the talk?

    -Profiling is used to measure and analyze the performance of a program at different locations, similar to how metrics are used to measure something over time. It helps to identify performance issues such as memory leaks or CPU usage inefficiencies.

  • What is the pprof format and why is it significant in the talk?

    -Pprof is a profiling format that supports multiple types of signals and measurements. It is significant because it is the format that OpenTelemetry profiles are based on, and it is built into Go, the language used for the OpenTelemetry Collector.

  • What is the difference between a metric and a profile in the context of performance monitoring?

    -A metric is a time series data point, a measurement of something at a specific time. A profile, on the other hand, is a measurement taken at a specific location in a program, and when aggregated, it provides insights into what the program was doing over the measured locations.

  • What is the role of Braden KES in the OpenTelemetry project?

    -Braden KES is a code owner on the host metric receiver and a member of the system metric semantic conventions working group. He mainly focuses on system metrics and works on the Google Cloud Ops agent.

  • What was the issue with the Prometheus receiver that Braden KES investigated?

    -The issue was a suspected memory leak in the Prometheus receiver, where the memory usage was constantly growing over time.

  • What is a flame graph and how is it used in the context of the talk?

    -A flame graph is a visualization tool used in profiling to show where different allocations are happening in the program. It helps identify areas of the program that are using more resources, which can be indicative of performance issues.

  • What was the conclusion from the memory profiling of the Prometheus receiver?

    -The conclusion was that the Prometheus receiver itself was not allocating a significant amount of memory. The growth in memory usage was attributed to the cumulative to Delta processor, suggesting a potential cardinality leakage issue.

  • What is the significance of the cumulative to Delta processor in the context of the memory leak investigation?

    -The cumulative to Delta processor was identified as the main consumer of memory, suggesting that it might be storing too much data without properly evicting old entries, leading to a memory leak.

  • What was the second case study presented in the talk?

    -The second case study focused on improving the efficiency of process metric collection, specifically addressing the high CPU usage when getting the parent process ID for every process on Windows using the host metrics receiver.

  • What was the solution proposed to reduce CPU usage in the process metric collection on Windows?

    -The solution proposed was to use the Windows Management Interface (WMI) to query and retrieve the parent process ID in one go, instead of using the win32 API which was found to be inefficient.

  • What are the potential benefits of using the OpenTelemetry profile format across different languages?

    -The potential benefits include a standardized way to profile applications, allowing for consistent tooling and analysis methods across different programming languages, making it easier to identify and solve performance issues.

Outlines

00:00

πŸ” Introduction to Profiling for Performance Tuning

The speaker, Braden K, introduces the topic of using profiling for performance tuning of the OpenTelemetry Collector. He emphasizes the importance of understanding profiling signals and encourages the audience to explore profiling tools that are readily available. Braden provides a basic analogy for profiling, comparing it to taking measurements at different locations in a program over time to build a picture of the program's behavior. He also discusses various profiling formats, focusing on PPR (Profile Protocol) and its extension, OpenTelemetry profiles, which are compatible with existing tools and built into the Go programming language, the language in which the OpenTelemetry Collector is written.

05:00

πŸ”Ž Case Study: Investigating Memory Leaks in Prometheus Receiver

The speaker presents a case study involving a memory leak issue in the Prometheus receiver of the OpenTelemetry Collector. He describes using PPR to take hourly profiles to understand the memory allocation over time. Initially, the Prometheus receiver's memory allocation was not significantly large, but over a period of five hours, it grew substantially. The speaker identifies the 'cumulative to Delta processor' as a significant consumer of memory, which continued to grow across profiles. This led to the hypothesis of a cardinality leakage issue within the metric pipelines. The speaker suggests configuring cache eviction to mitigate memory growth and discusses the importance of understanding and addressing cardinality leakage in profiling.

10:03

πŸš€ Improving Efficiency in Process Metric Collection

Braden K discusses his efforts to improve the efficiency of process metric collection, which is known for its high cardinality and the cost of system calls. He presents a CPU profile analysis of the process scrape on Windows, revealing that a significant amount of time was spent retrieving the parent process ID for each process. The initial method used, 'create tool help 32 snapshot', was found to be inefficient due to its extensive data capture. Braden explores an alternative approach using the Windows Management Interface (WMI) to perform a single query for all process information, reducing the time spent on a single scrape from 740 milliseconds to 220 milliseconds. He also mentions the possibility of further reducing scrape time by disabling the parent PID metric if it's not required, which could lead to even more efficient process scraping.

🌟 Conclusions and the Future of Profiling with OpenTelemetry

In conclusion, the speaker emphasizes that profiling is not a magical process but a skill accessible to everyone. He encourages developers and users to leverage profiling tools to understand and improve their applications. Braden highlights the importance of using PPR and OpenTelemetry profiles for targeted issue resolution and continuous profiling. He expresses excitement for the adoption of OpenTelemetry profiles across different tooling and languages, envisioning a future where the profiling standard can be used uniformly across various programming environments.

Mindmap

Keywords

πŸ’‘otel Collector

The 'otel Collector' refers to the OpenTelemetry Collector, which is a tool designed to gather, process, and export telemetry data such as metrics, logs, and traces. In the video, the speaker discusses performance tuning for this collector, indicating its central role in the presentation's theme of optimizing data collection for observability.

πŸ’‘profiling

Profiling in the context of this video is the process of measuring and analyzing the performance of a program, specifically focusing on resource usage such as CPU and memory. The speaker aims to build excitement for profiling as a means to identify and solve performance issues within the OpenTelemetry Collector.

πŸ’‘PPR (pprof)

PPR, or 'pprof', is a tool that comes built into Go, used for profiling and visualizing the performance of Go programs. The speaker discusses using PPR to analyze performance problems with the OpenTelemetry Collector, highlighting its importance in the script as a method for performance analysis.

πŸ’‘otel profiles

Otel profiles refer to a specific format for profiling data that is an extension of PPR. The speaker mentions that the current proposal for the data model is a straight extension of PPR, indicating that 'otel profiles' are integral to the profiling discussion in the video.

πŸ’‘flame graph

A flame graph is a visualization tool used to represent profiled data, showing where time or memory is spent in a program. The speaker plans to use flame graph visualization to analyze memory allocation in the OpenTelemetry Collector, demonstrating its utility in identifying performance bottlenecks.

πŸ’‘memory leak

A memory leak occurs when a program consistently consumes more memory over time without releasing it back to the system. In the script, the speaker investigates a potential memory leak in the Prometheus receiver of the OpenTelemetry Collector, using profiling to diagnose the issue.

πŸ’‘cardinality leakage

Cardinality leakage refers to a situation where the number of unique elements, such as metric identities, grows without bound, leading to increased memory usage. The speaker suspects cardinality leakage in the 'cumulative to Delta processor' of the OpenTelemetry Collector, as evidenced by growing memory allocation in profiling data.

πŸ’‘process metric collection

Process metric collection is the act of gathering metrics related to system processes, which can be resource-intensive due to the high cardinality of process data. The speaker discusses efforts to make this process more efficient in the OpenTelemetry Collector, particularly on Windows platforms.

πŸ’‘CPU profile

A CPU profile is a snapshot of where a program's execution time is spent, which can help identify performance bottlenecks related to computation. The speaker presents a CPU profile of the process scrape operation in the OpenTelemetry Collector, revealing that a significant amount of time is spent retrieving parent process IDs.

πŸ’‘Windows Management Instrumentation (WMI)

WMI is a set of extensions to the Windows OS that allows for management and monitoring of system components. The speaker explores using WMI to improve the efficiency of process metric collection in the OpenTelemetry Collector on Windows, by querying process information in a more optimized manner.

πŸ’‘continuous profiling

Continuous profiling refers to the ongoing monitoring and analysis of a program's performance over time. The speaker mentions the availability of tools for continuous profiling, suggesting that this approach can provide deeper insights into performance issues within the OpenTelemetry Collector.

Highlights

Introduction to tuning the OpenTelemetry Collector performance through profiling.

Emphasis on the accessibility of profiling tools for performance analysis.

Speaker's introduction: Braden K, a contributor to OpenTelemetry Collector and system metrics.

Explanation of profiling as a measurement at a location in a program over time.

Overview of profiling formats, including PPR and Otel profiles.

PPR's support for multiple signal types, including CPU and memory profiling.

Built-in profiling tools in Go for OpenTelemetry Collector.

Demonstration of using PPR extension with the OpenTelemetry Collector.

Case study on Prometheus receiver performance and potential memory leak.

Use of PPR to diagnose and analyze memory allocation issues.

Investigation of cumulative to Delta processor's memory usage over time.

Hypothesis of cardinality leakage in metric pipelines.

Efficiency improvements in process metric collection.

CPU profiling of process scraping on Windows and identification of inefficiencies.

Implementation of Windows Management Interface for more efficient process metadata retrieval.

Performance improvement by disabling parent PID in process scraping.

Conclusion emphasizing the power of profiling tools for developers and users.

Encouragement for using profiling to understand and report issues in collectors.

Anticipation for the proliferation of OpenTelemetry profiling in various tooling.

Transcripts

play00:00

all right so um I'm going to be giving a

play00:02

talk about uh tuning The otel Collector

play00:05

performance through profiling uh this is

play00:08

specifically about some of the work I've

play00:09

been doing using uh prpr built-in to go

play00:12

to uh make some minor improvements or

play00:15

analyze some performance problems with

play00:17

the open lemetry collector uh this is

play00:19

sort of a two-prong talk you know one of

play00:21

is to sort of build excitement for the

play00:22

profiling signal in general if you're

play00:23

not familiar with it but also to show

play00:26

that these tools already exist to start

play00:28

getting into profiling if you're into

play00:30

interested uh it's all very accessible

play00:32

uh so I'm Braden KES uh you might know

play00:36

me through GitHub as Braden K or if

play00:38

you're a big meany head brayon uh I work

play00:42

on the open Telemetry collector and

play00:43

semantic conventions mainly focused on

play00:45

uh system metrics since it's a we we

play00:48

heavily use it on the Google Cloud Ops

play00:50

agent which is uh my team's tool um so

play00:53

I'm a code owner on the host metric

play00:55

receiver and I'm a member of the system

play00:57

metric semantic conventions working

play00:58

group

play01:00

uh I'm going to start by grossly

play01:02

oversimplifying what a profile is to

play01:03

sort of uh explain it if you've never

play01:05

heard of it this is the analogy that

play01:08

sort of cracked it for me um if you

play01:10

think of a metric um usually it is a

play01:13

Time series data point it's a

play01:14

measurement of something at a period at

play01:18

like a time stamp at a period of time if

play01:20

you map that data over a span of time

play01:23

you can start to paint a nice picture of

play01:25

what was going on with that measurement

play01:26

over that span of time uh the way I like

play01:29

to think of profiling is kind of like

play01:30

that except instead of a measurement

play01:32

taken at a period of time it's a

play01:34

measurement taken at a location in a

play01:36

program and when you paint that over all

play01:38

the locations available to the program

play01:40

you can start to paint a picture of what

play01:42

your program was doing with the

play01:43

measurement you're taking whether that

play01:44

be CPU or memory based um these are some

play01:48

of the uh profiling uh some some of the

play01:52

profiling formats you might be familiar

play01:53

with my first time using profiling was

play01:55

actually uh with a call grind this is a

play01:57

tool under Val grind I had no idea what

play02:00

any of it meant uh PPR is the format I'm

play02:03

going to be talking about today and otel

play02:06

profiles are uh an extension of PPR you

play02:08

know the the current version to proposal

play02:12

of the data model is actually just a

play02:13

straight extension of PPR it's backwards

play02:15

compatible with the old format um the

play02:18

reason I'm going to be talking about P

play02:20

today um mainly is that for one thing it

play02:24

has uh support for multiple types of

play02:27

signal signals and measurements uh you

play02:29

can call grind for examples focused

play02:31

specifically on CPU profiling and uh you

play02:34

know kernel Linux kernel perf is another

play02:36

popular one that's very focused on the

play02:38

CPU side of things but PPR does both and

play02:41

we're going to be looking at both today

play02:43

um and I already mentioned that whoops

play02:45

uh but yes it's the format that otel

play02:47

profiles are based off of um the tools

play02:50

to utilize it are actually built right

play02:52

into go which the open Telemetry

play02:53

collector is written in or you might

play02:55

have applications written in it as well

play02:57

uh and I'm going to be demonstrating a

play02:58

little bit of that today as well uh to

play03:00

use prpr with the otel collector uh you

play03:02

can use the prpr extension um this

play03:06

automatically configures the stuff that

play03:08

you would need to manually configure

play03:09

yourself which is not all that hard

play03:11

really but it's even more convenient

play03:12

that it's built in uh this will actually

play03:15

spin up a PPR server that you can go

play03:18

query yourself with the PPR tool to get

play03:20

a profile at a specific time get a

play03:22

specific type of profile at a certain

play03:23

time uh you you do that with this

play03:26

command uh if you have go installed this

play03:28

is already here you don't have to do any

play03:30

extra go install or anything like that

play03:31

it's actually built into the

play03:33

installation um and then that's it sort

play03:36

of you have to install graph F to get

play03:38

the graphs and stuff but for the most

play03:39

part this is it uh and this is all you

play03:40

need to do to get started with it uh so

play03:43

I'm going to be looking at a couple of

play03:44

case studies and because I know I'm

play03:46

playing us into the break and

play03:47

everybody's excited for coffee I'm going

play03:49

to try and be a little bit brief with it

play03:52

but I will try and hopefully get all the

play03:54

information we need out of it this uh

play03:56

issue came to my attention uh when I was

play03:59

talking internally about Prometheus

play04:01

receiver performance stuff that I was

play04:02

trying to improve and this issue opened

play04:05

by Enrique who you might be you might

play04:06

know him from his uh YouTube channel is

play04:08

it observable uh he was doing some uh

play04:11

performance testing of different

play04:13

Prometheus scrapers and he was having

play04:15

issues with the Prometheus receiver

play04:16

doing what it looked like it was leaking

play04:18

memory because it was the memory was

play04:20

constantly growing over time uh so I

play04:23

opened up prpr to try and get some

play04:26

information and oops there we go we go

play04:30

uh when you open up uh the PPR web UI

play04:34

this is sort of what you get um the

play04:38

default view here is a graph which is um

play04:41

pretty good for for memory maybe not so

play04:44

much for CPU profiles but it's a pretty

play04:45

good view for memory you can sort of see

play04:48

that oh geez this is going to be tough

play04:50

because I'm on a lower resolution uh but

play04:52

you can sort of see what where different

play04:55

allocations are happening and where

play04:57

different spots are holding memory uh

play04:59

for today we're actually going to be

play05:00

looking specifically at the flame graph

play05:02

visualization um oh it's so zoomed in

play05:05

this is going to be fun uh This what I

play05:09

had enre do was to take profiles at

play05:13

every hour over a period of time since I

play05:15

couldn't really replicate his setup so

play05:17

well so it's sort of like budget

play05:19

continuous profiling um this first

play05:21

profile we're seeing we're working with

play05:23

half a gig of Heap space when you look

play05:25

at a PPR profile you're specifically

play05:27

looking at the the Heap there is more

play05:30

memory that ends up getting used if you

play05:31

read the number from your system versus

play05:33

the profile you're going to get a

play05:34

different number because the Heap is

play05:36

only one part of the memory map but it

play05:38

is important for when we're talking

play05:39

about memory leaks because memory leaks

play05:41

are like the region of memory that's

play05:44

growing is going to be the Heap usually

play05:46

um if we look at this for an issue

play05:50

called Prometheus receiver memory leak

play05:53

this spot where the Prometheus receiver

play05:54

is actually allocating memory is really

play05:56

not that big and I was a bit surprised

play05:57

by that but this is pretty early in the

play05:59

measurement the biggest thing that we

play06:01

see right now is actually the cumulative

play06:04

to Delta processor uh this works by

play06:06

storing an original point of of the

play06:10

metric to do a Delta calculation against

play06:12

because that's how uh Delta metrics work

play06:14

so for the first profile early on in the

play06:16

Run we kind of expect that this would be

play06:18

relatively large if you're doing a lot

play06:19

of metrics if you're converting a lot of

play06:21

metrics there needs to be for each

play06:22

metric identity you know an original

play06:25

point to calculate against so maybe

play06:27

that's okay for the first profile but I

play06:29

kind of hoped that what what what I

play06:31

would expect to see is in the later

play06:33

profile I have another profile from like

play06:34

5 hours later and that this region of

play06:36

memory from the Prometheus receiver I

play06:38

would expect to see it grow um if we go

play06:42

and look at the from five hours later it

play06:45

has grown quite a bit we're up to two

play06:47

and a half gigs of Heap space but the

play06:49

shape is basically the same the

play06:51

cumulative to Delta processor uh is

play06:54

still taking up the most and the numbers

play06:56

themselves have continued to grow um and

play07:00

when I thought I had more time I was

play07:01

going to take on us all on an adventure

play07:03

through how this all works in the

play07:04

cumulative to Delta processor but we

play07:06

don't have time so basically what this

play07:08

what this led me to believe is that

play07:10

there are some manner of cardinality

play07:12

leakage um this sync map loader store

play07:16

this map is a storage of uh different

play07:18

metric identities where it's like the

play07:20

name the labels and the label values so

play07:23

that every unique time Series has its

play07:24

own original data point stored um this

play07:27

region of memory continues to grow and

play07:29

even through through every profile this

play07:31

was consistently the one that was

play07:32

growing which led me to believe that one

play07:34

of the metric pipelines there has some

play07:37

manner of cardinality leakage um but the

play07:40

the Prometheus endpoints he was scraping

play07:42

were popular like community exporters

play07:44

and I don't really know which one is the

play07:45

culprit we haven't figured that out yet

play07:47

um but there is a feature in the

play07:49

communative to Delta processor that will

play07:51

evict old entries if it hasn't seen it

play07:52

in a certain amount of time so I am

play07:55

having him configure that to hopefully

play07:57

see that that region of memory not grow

play08:00

too much if we have cardinality leakage

play08:02

in one of the pipelines but that's a

play08:04

good lesson about using cumulative to

play08:05

Delta make sure you're not leaking

play08:06

cardinality too much because this can

play08:09

this can happen or at least make sure

play08:10

you configure the cache eviction um oops

play08:15

that's the wrong one the second case

play08:18

study that we're going to be looking at

play08:20

um I'm a host metrics code owner um and

play08:23

one of the Crusades that I've been on is

play08:25

to make process metric collection more

play08:28

efficient because process metrics are

play08:30

very high cardinality and to get a lot

play08:32

of the information you need you need to

play08:33

make system calls and that's very

play08:35

expensive uh and we had these two issues

play08:38

we're only going to be looking at one of

play08:39

them today I don't have time if you want

play08:40

to talk about the first one where uh we

play08:44

were looking at uh host metric receiver

play08:46

on Linux uh the fix for that has

play08:49

actually landed so if you want to talk

play08:50

about it come find me after um but I'm

play08:52

going to be looking at the slightly more

play08:53

interesting one which is the second

play08:55

issue related to uh process scraping on

play08:57

Windows uh of course the the challenge

play08:59

of the host metrics receiver is that the

play09:02

a lot of the process metrics are the

play09:04

same or across platforms but the

play09:05

implementation of how you get them is

play09:07

completely different uh so we're I'm

play09:09

going to look at two profiles this first

play09:12

one um is this now we're into CPU

play09:15

profiles the last one was a memory

play09:16

profile this is a uh CPU profile of one

play09:19

scrape I I time you when you take a CPU

play09:22

profile you time it over a certain

play09:23

amount of time uh and pre- profile

play09:26

sample at a certain rate for events

play09:30

uh and you can get the 740 milliseconds

play09:33

we see here that is how much on CPU work

play09:37

was sampled uh and this is

play09:39

representative of one process scrape

play09:41

scrapes all the processes on the system

play09:42

records all the metrics for it and that

play09:44

usually happens on an interval it was

play09:45

happening on interval but of about a

play09:47

minute in this case and I sampled for 40

play09:49

seconds so that's what we're looking at

play09:51

um Beware of the jump

play09:54

scare uh the width of these uh each

play10:00

section is how much like What proportion

play10:02

of the of the work essentially was being

play10:05

taken here and because I'm I'm a little

play10:07

squished because of the zoom in you

play10:08

can't see but 88% of time was spent

play10:11

getting the parent process ID for every

play10:13

process in the scrape and when I saw

play10:14

this I near jumped out of my chair uh

play10:17

because it really shouldn't be that

play10:19

ridiculous uh but it turns out with the

play10:21

win32 API this call create tool help 32

play10:25

snapshot is as far as I can see and as

play10:28

far as the go PS util maintainers can

play10:30

see the best way to get the parent

play10:32

process ID for a process but this

play10:35

snapshot snaps tons and tons of

play10:38

information including Heap and thread

play10:40

space for the entire process and it's

play10:42

doing this for every process uh so that

play10:45

is extremely expensive uh and I was on a

play10:48

hunt for a better way to do it um

play10:50

there's a lot of Microsoft people here

play10:52

so they're going to kind of know this

play10:53

part um but the the best other way that

play10:56

I can see to get uh the parent process

play10:59

ID was Through the Windows management

play11:01

interface uh it has a SQL query type of

play11:04

language where you can query information

play11:05

about the system and specifically about

play11:08

processes and I'm already using it on an

play11:10

old metric that I implemented to get the

play11:12

process uh handles uh process the

play11:15

handles belonging to a process getting

play11:17

that information the only really good

play11:20

way I could see in the win32 API was

play11:22

using an unsupported nquery system

play11:25

information query process information I

play11:27

forget what it's called but it's an

play11:28

unsupported win32 API and the we weren't

play11:31

really super excited about using that uh

play11:34

so I started using the Windows

play11:36

management interface to query get the

play11:39

information for every process in one

play11:40

query and then sort of organize that

play11:43

information as we're scraping process uh

play11:45

scrap doing this get process metadata

play11:47

work uh and that led to this second

play11:51

profile of the new version that I came

play11:52

up with um and we are down to uh only

play11:56

220 milliseconds of work roughly cut

play11:59

quite a bit of work out of a single

play12:01

scrape uh by doing this in a wmi query

play12:04

uh the query is the most expensive part

play12:06

of the process scrape still um but this

play12:09

actually has a sort of a back door

play12:10

improvement too which is if you use the

play12:12

process. handle metric already if you've

play12:14

enabled that um then the information all

play12:17

comes in one query and it's not more

play12:19

expensive to get that second metric so

play12:21

this is a pretty good Improvement the pr

play12:23

for it is open I haven't got it merged

play12:25

yet but I'm hoping we can get that

play12:26

merged um and the other thing that I did

play12:29

was I made it so that if you disable the

play12:31

parent PID like if you just don't really

play12:33

care about that resource attribute uh

play12:35

you can delete it and essentially what

play12:37

you get is

play12:40

just um let's see it was you ignore this

play12:44

yeah you get down to 90 milliseconds so

play12:46

if you don't if you don't want the

play12:49

parent pit in your scraping processes on

play12:51

Windows you don't care about that then

play12:52

you can get much more efficient just by

play12:54

disabling it after the pr is merged

play12:56

hopefully that will get merged soon um

play13:00

so I'm pretty close to being out of time

play13:02

I'm sure uh so I'm just going to go

play13:04

quickly through sort of some of the

play13:05

conclusions we come here uh mainly the

play13:07

main thing I want to take away is that

play13:09

like this isn't Magic you know anybody

play13:11

has the power to understand this if I

play13:13

have the power to understand this uh and

play13:15

the tools are readily available to

play13:16

everyone so if you're an otel collector

play13:18

developer or really if you're in any

play13:20

language there's probably similar

play13:21

solutions for this you know we don't

play13:23

have to wait for otel profiling to be

play13:24

ready to start solving some of the

play13:26

problems that profiling is good at um

play13:28

and if you have you know if you have a

play13:29

collector even if you're not a developer

play13:31

of The Collector but you want to

play13:32

understand a little bit more about what

play13:33

your collector's doing or you want to

play13:35

you know report a GitHub issue uh using

play13:38

the PPR extension and sort of

play13:39

understanding how to like show either

play13:41

show screenshots or send profiles along

play13:43

to maintainers that stuff is very

play13:45

useful um the profiling is in this CA in

play13:49

this case I was looking at individual

play13:51

problems you know I was trying to Target

play13:53

specific specific issues at specific

play13:55

times manually taking profiles but there

play13:58

are a lot of solutions out there for

play14:00

continuous profiling you know I tried to

play14:01

focus this talk only on generic

play14:03

Solutions not nothing vendor specific

play14:06

but lots of vendors have lots of uh to

play14:09

Great tools for you know even better

play14:11

flame graph views uh continuous

play14:14

profiling over time so if you want to

play14:16

look at us at it use basically you can

play14:18

use it a lot like tracing uh but get you

play14:21

know more granular information about

play14:22

your program instead of like about your

play14:24

whole

play14:24

system um and just I'm excited for otel

play14:28

profile in basically that's the that's

play14:30

the end of it I'm really excited for

play14:31

this to sort of uh proliferate through

play14:34

other tooling and so that we can use

play14:36

this the profiling standard but be able

play14:39

to Tool it the same way across different

play14:41

languages I think that's very exciting

play14:43

uh that's everything I got and uh I

play14:45

don't know if there's time for questions

play14:47

if we're on break but you can come find

play14:48

me afterwards thanks

play14:50

[Applause]

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Performance TuningOpenTelemetryProfiling ToolsMemory LeaksCPU ProfilingPPR FormatOTEL CollectorGo LanguageMetric AnalysisCase Studies