Managing Observability Data at the Edge with the OpenTelemetry Collector and OTTL - Evan Bradley

CNCF [Cloud Native Computing Foundation]
29 Jun 202412:44

Summary

TLDRIn this insightful session, Evan, a contributor to the OpenTelemetry Collector and OTL, introduces the audience to the Collector's capabilities and OTL's role in observability pipelines. He presents a case study of 'Global Telescopes' to demonstrate how to use OTL for data processing, including filtering, parsing, and redacting sensitive data. The session also covers routing and sampling strategies, showcasing the flexibility of OTL for telemetry data management. Evan wraps up with a sneak peek into upcoming features and concludes with a Q&A, inviting participants to explore and test OTL's capabilities.

Takeaways

  • πŸ“ˆ Evan has been contributing to The otel Collector and OTL for about two years.
  • πŸ”„ The otel Collector is a middleware in observability pipelines, processing and routing data through its internal pipeline model.
  • πŸ” OTL is a language used for reading from and writing to data as it flows through the otel Collector, offering flexibility and a common configuration format.
  • 🏭 A hypothetical case study is presented involving a company called Global Telescopes, which has applications hosted worldwide and deals with data privacy and telemetry processing.
  • πŸ›  The Collector processes data through receivers, processors, and exporters, allowing for filtering, enriching, and routing of data.
  • πŸ” Data redaction and handling of PII are demonstrated, including the use of hashing (SHA-256) to protect sensitive information.
  • πŸ“Š The routing and sampling of data in a centralized collector are covered, with a focus on handling errors and payment service data.
  • πŸ“„ Examples of OTL statements and configurations are provided, illustrating the setup and processing within the otel Collector.
  • πŸ†• New features include optional parameters, functions as parameters, and additional functions to enhance the capabilities of OTL.
  • πŸ”§ Future improvements aim to handle list values and stabilize the transform processor, making the system more user-friendly.

Q & A

  • What is the main focus of Evan's presentation?

    -Evan's presentation focuses on The Otel Collector and OTL, including an introduction to these tools, a hypothetical case study, and how to apply these components to various setups.

  • What are The Otel Collector and OTL?

    -The Otel Collector is a middleware in observability pipelines that processes and routes data. OTL is an easy-to-read language that allows for reading from and writing to data as it flows through the collector.

  • How does the internal pipeline model of The Otel Collector work?

    -The internal pipeline model of The Otel Collector consists of different components, such as receivers, processors, and exporters. Data enters through receivers, is processed by processors, and is sent out by exporters.

  • What is the purpose of connectors in The Otel Collector?

    -Connectors in The Otel Collector are used to connect pipelines and perform tasks such as routing data. They add flexibility to the pipeline model.

  • What is the main advantage of using OTL in The Otel Collector?

    -The main advantage of using OTL is its flexibility and common configuration format, which allows users to work with data in the collector without worrying about input or output formats.

  • What is the hypothetical case study presented by Evan?

    -The case study involves a company called Global Telescopes, which needs to handle data from applications hosted worldwide, comply with local data privacy laws, and process telemetry data through sidecar collectors and a centralized collector.

  • How does the filter processor help in the case study?

    -The filter processor is used to filter out unnecessary data, such as noisy logs at debug level, to reduce the amount of data that needs to be processed further.

  • What is the purpose of redacting data in the case study?

    -Redacting data, such as purchase IDs considered PII, ensures that sensitive information is not exposed when data leaves the region. It also helps in handling data deletion requests by hashing the PII.

  • How does the routing connector function in the case study?

    -The routing connector directs data to the appropriate backend based on annotations. Data from newly acquired teams is routed to their old backend, while other data is routed to the company-wide backend.

  • What new features have been added to OTL recently?

    -Recent additions to OTL include optional parameters, functions as parameters, and 15 new functions. Future improvements are planned for handling list values and stabilizing OTL in the transform processor.

Outlines

00:00

πŸ‘‹ Introduction and Overview

Evan introduces himself and outlines the session's agenda. He discusses the OTel Collector and OTL, their components, and their functionalities. He also sets the stage for a hypothetical scenario to demonstrate the practical application of these tools.

05:00

πŸ”„ OTL and Data Processing

Evan explains OTL's role in data processing within the OTel Collector. He provides a brief overview of OTL's syntax and its usage for setting attributes. He then moves into the hypothetical case study of Global Telescopes, detailing how data is processed, filtered, and routed through different collectors.

10:01

πŸ” Filtering and Parsing Data

Evan elaborates on the process of filtering and parsing logs using the OTel Collector. He describes the use of the filter processor to remove unnecessary data, like noisy logs, and the parsing of JSON logs into a structured format. He also introduces the concept of redacting sensitive information, such as purchase IDs, using SHA-256 hashing.

πŸ”„ Routing and Sampling

Evan discusses the routing and sampling of data in the OTel Collector. He explains how data is routed to different pipelines based on annotations and how tail sampling is used to reduce data volume. Specific conditions for sampling, such as errors and payment services, are highlighted.

πŸ†• New Features and Testing

Evan covers recent updates and new features added to the OTel Collector, including optional parameters and functions as parameters. He also talks about upcoming improvements and provides links to documentation. Additionally, he addresses questions from the audience about persistence, retry logic, and the importance of hashing PII close to the source.

Mindmap

Keywords

πŸ’‘Observability

Observability refers to the degree to which the internal state of a system can be inferred from external outputs. In the context of the video, observability is the overarching theme, with the otel Collector serving as a middleware to process and route data for observability pipelines. The script discusses how the Collector's flexibility allows for the handling of data to ensure visibility into the system's operations.

πŸ’‘otel Collector

The otel Collector is a middleware component in observability pipelines that processes and routes data. It is highlighted in the script as a tool that Evan, the speaker, helps maintain. The Collector's functionality is central to the video's narrative, illustrating how data flows through the system and is manipulated for various purposes such as filtering and enriching telemetry data.

πŸ’‘OTL (OpenTelemetry Language)

OTL, or OpenTelemetry Language, is an easy-to-read language used for reading from and writing to data as it flows through the otel Collector. The script emphasizes OTL as a standard way to work with data in the Collector due to its flexibility and common configuration format. An example OTL statement in the script demonstrates setting an attribute based on a regular expression match.

πŸ’‘Pipeline

In the script, a pipeline refers to the sequence of operations that data undergoes as it is processed by the otel Collector. The internal pipeline model is composed of different components that can be connected to process data in a specific order. The concept of pipelines is integral to understanding how data is handled and transformed within the Collector.

πŸ’‘Receivers

Receivers are components of the otel Collector that intake data and translate external formats into the Collector's internal pipeline data format. As mentioned in the script, data comes into the Collector through receivers, which are the starting point of the data's journey through the pipeline.

πŸ’‘Processors

Processors are components in the otel Collector's pipeline that perform operations such as filtering, enriching, and modifying the data. The script discusses the use of processors to handle tasks like filtering out noisy logs and parsing JSON data into a structured format, which is essential for further processing and analysis.

πŸ’‘Exporters

Exporters are the components responsible for translating the internal data format back into an external format and sending it to a designated destination. In the script, it is mentioned that data leaves the Collector at exporters, which is the final step in the pipeline before data is sent elsewhere, such as to an OTL endpoint.

πŸ’‘Connectors

Connectors in the script are components that link different pipelines together within the otel Collector. They enable complex routing and data manipulation tasks. The speaker focuses on using connectors for routing data, which is a key aspect of the hypothetical situation presented in the video.

πŸ’‘Case Study

The term 'case study' in the script refers to a hypothetical situation devised by the speaker to demonstrate the application of OTL and various components in solving a real-world problem. The case study of 'Global Telescopes' is used to illustrate how data is processed and managed across different regions while adhering to local data privacy laws.

πŸ’‘Filter Processor

A filter processor is a type of processor used in the otel Collector's pipeline to selectively allow or disallow data based on certain criteria. In the script, the filter processor is used to eliminate noisy logs at the debug level or lower, ensuring that only relevant information is passed through the pipeline.

πŸ’‘Tail Sampling

Tail sampling is a technique mentioned in the script for reducing the volume of data sent to the backend by sampling only a portion of the data. It is used in the case study to cut down on costs by sampling traces based on certain conditions, such as error occurrences or specific service interactions.

Highlights

Introduction to Evan, the speaker, who has been contributing to The OpenTelemetry Collector and OTL for about two years.

Overview of the agenda for the session, including an introduction to the OpenTelemetry Collector, OTL, and a case study.

The OpenTelemetry Collector is a middleware for observability pipelines, capable of processing and routing data with an internal pipeline model.

Explanation of the components in the Collector's pipeline, including receivers, processors, and exporters.

The innovative use of connectors to link pipelines together for advanced routing and processing.

OTL (OpenTelemetry Language) is introduced as an easy-to-read language for data manipulation within the Collector.

OTL's flexibility and standardization across different components in the Collector.

Case study of Global Telescopes, a conglomerate dealing with data privacy laws and telemetry processing.

Use of sidecar collectors to filter and redact data before it leaves the region, complying with local data privacy laws.

Introduction of a hypothetical situation where Global Telescopes acquired a new company, requiring data routing changes.

Implementation of tail sampling to reduce costs in the centralized collector's data processing.

The use of filter processors to eliminate noisy logs and unnecessary span events.

OTL's cache feature for temporary data storage during pipeline processing.

Parsing JSON logs into a structured format for further processing and redaction.

Redaction of PII (Personally Identifiable Information) such as purchase IDs using hashing techniques before data transmission.

Handling data deletion requests by hashing identifiers to locate and remove data from the backend.

Routing data using the routing connector with OTL support for efficient data management.

Tail sampling policy implementation to selectively process and reduce data volume in the companywide backend.

New features in OTL, including optional parameters and an expansion of functions to enhance flexibility.

Future plans for OTL to handle list values and stabilize HL in the transform processor for improved functionality.

Debugging capabilities added to OTL for testing and development, including debug logging and traces.

Transcripts

play00:00

all right Hello Seattle how we

play00:02

doing all right good just make sure

play00:04

you're

play00:06

awake okay so a little bit about me uh

play00:08

I'm Evan I help maintain both The otel

play00:11

Collector and OTL both of which I've

play00:13

been contributing to for roughly about

play00:15

two years now and uh before we get

play00:18

started I want to quickly go over what

play00:20

we're going to cover today so I'm going

play00:21

to give a quick intro to The otel

play00:23

Collector I know we've talked about a

play00:24

little bit but uh we're to go a little

play00:26

bit deeper so I just want to make sure

play00:27

everyone's on the same page and then I'm

play00:29

going to cover what OTL is after that

play00:32

I'm going to introduce a hypothetical

play00:35

situation that I've uh devised that

play00:38

we're going to solve using OTL and a

play00:39

handful of popular components uh and I'm

play00:42

hoping that by the end of this you're

play00:43

going to have an idea of how you could

play00:45

apply these components to your own

play00:47

setups so first uh for anyone who isn't

play00:50

familiar The Collector s is a middleware

play00:53

in your observability pipelines and can

play00:55

process and Route data as it flows

play00:57

through um the collector's flexibility

play01:00

comes from its internal pipeline model

play01:02

which is composed of different

play01:04

components that you can string together

play01:06

and data comes into the collector

play01:08

through receivers you'll see that on the

play01:09

left there which translate an external

play01:12

format into the collector's internal uh

play01:15

pipeline data format and the data stays

play01:18

in this internal format until it leaves

play01:19

the collector uh after receivers the

play01:23

data will go through processors which

play01:25

can Reda filter enrich Etc and then

play01:28

finally it leaves the collector at expor

play01:30

which translate that into an external

play01:32

format and send it somewhere else for

play01:34

example to an ootl endpoint uh something

play01:37

really cool though is that you can

play01:39

connect pipelines together with

play01:41

components that are called connectors

play01:42

and you can do all sorts of things with

play01:44

connectors but the thing that I'm going

play01:45

to focus on today is just going to be

play01:47

routing

play01:49

data so a quick intro to OTL OTL is an

play01:52

easyto read language that allows reading

play01:54

from and writing to data in place as it

play01:57

flows through the collector it's

play01:59

steadily becoming the standard way to

play02:00

work with data in The Collector since

play02:01

it's flexible and offers a common

play02:04

configuration format across a whole

play02:05

bunch of different components that use

play02:07

it um and since all collector components

play02:10

work with this internal data format you

play02:12

can use OTL without having to worry

play02:14

about the input or output format of the

play02:16

data and at the bottom here you can see

play02:19

an example OTL statement so this just

play02:23

basically sets an attribute where some

play02:26

name matches a regular expression so

play02:28

hopefully pretty straightforward easy to

play02:29

read

play02:31

moving into the case study uh let's

play02:33

consider a company uh Global telescopes

play02:36

that is a telescope manufacturing and

play02:38

sales conglomerate that sells to

play02:40

organizations worldwide and to serve its

play02:42

customers where they are it has

play02:44

applications that are hosted in regions

play02:45

around the world uh but this comes with

play02:48

some complexity and to deal with local

play02:50

data privacy laws and scale their

play02:52

Telemetry processing the applications

play02:54

send their data to sidecar collectors

play02:56

that filter out extra data and redact it

play02:59

uh before before it leaves the region

play03:01

after it leaves the region uh the data

play03:03

is then collected into a centralized

play03:04

collector where they can take actions

play03:06

that need to be handled companywide or

play03:08

otherwise require a single collector

play03:10

instance and for this example let's say

play03:12

that the Gateway collector needs to do

play03:14

two things uh first as conglomerates due

play03:17

the global telescopes company just

play03:19

acquired another company that is being

play03:21

added to its consumer retail arm and the

play03:23

new teams haven't yet fully integrated

play03:25

with the rest of the company so the data

play03:27

from their apps needs to be routed into

play03:29

their old back end

play03:30

and then for the rest of the data that

play03:32

is routed into the company wide backend

play03:35

it needs to be sampled to cut down on

play03:36

costs and they want to do tail sampling

play03:39

to uh do this that needs to be added

play03:42

into a single collector to work properly

play03:44

um so if you look at this setup here

play03:47

this is a uh basically a pipeline

play03:51

diagram model for a single region uh for

play03:54

what this would look like uh inside of

play03:55

the collectors that we've configured so

play03:58

basically data comes in through OTP is

play04:00

processed in the side car in the filter

play04:02

and transform processors and then is

play04:03

sent on to the second collector which

play04:06

determines where the data needs to go

play04:08

using the routing connector and then

play04:10

finally samples with the tail sampling

play04:12

processor for the data that ends up in

play04:14

the companywide back end diving into it

play04:17

let's start with the side car so first

play04:21

we want to use the filter processor to

play04:22

filter out a bunch of data uh sometimes

play04:25

the developers leak their log level at a

play04:26

little higher setting than they should

play04:28

and these logs are pretty noisy so we

play04:29

want to filter them out or uh as Jamie

play04:31

was talking about earlier maybe they

play04:33

have some extra span events uh from

play04:35

their instrumentation that we want to

play04:36

get rid of um regardless um the filter

play04:38

processor can do the job so for this

play04:40

example specifically uh collector log

play04:43

levels are stored as integers and lower

play04:46

numbers mean nois your logs so we want

play04:48

to cut out anything that's at debug

play04:50

level or lower uh however we still want

play04:52

to keep info and error level logs so

play04:54

those are going to be passed through uh

play04:56

and with that we've cut out quite a bit

play04:57

of data so we can now move on to

play05:00

um additional processing on the logs so

play05:02

we want to parse them now the logs are

play05:04

sent to the collector in Json but we

play05:06

want them in a structured format so we

play05:08

need to parse them here and before we

play05:11

dive too deep into this I do want to

play05:12

call out OT Tail's cache feature which

play05:15

basically serves as a way to store

play05:17

temporary data inside of a map while

play05:19

you're working with something so the

play05:21

cache starts out empty before the first

play05:23

statement there and then after the very

play05:24

last statement is going to be cleared

play05:26

once again before the next payload comes

play05:28

in so purely temporary and only is used

play05:31

to hold data while we're doing these

play05:33

computations so what we're going to do

play05:35

is we're going to parse the body put it

play05:36

in the cache and then we're going to

play05:38

take the pars attributes map and parse

play05:40

log body out and then put them on the

play05:42

structured log and at the result of this

play05:44

we get a structured

play05:46

log with that we can now redact some

play05:49

stuff out of it so let's say we have a

play05:50

purchase ID and this is considered pii

play05:53

so we want to redact this before sending

play05:55

it to our back end and to make things

play05:57

interesting let's say that we also need

play05:59

to handle data deletion requests so in

play06:02

order to do that we would need to be

play06:04

able to locate the data inside of our

play06:05

back end given some customer input let's

play06:07

say they give us the purchase ID and we

play06:09

want to be able to find the equivalent

play06:10

data for that uh we could hash it let's

play06:13

say using a shot 256 hash so first uh

play06:18

I'm not a lawyer so don't take this as

play06:19

advice for how to handle pii it's just

play06:21

illustrative um but what we could do

play06:24

here is we could U basically take the

play06:26

purchase ID out of an attribute uh match

play06:28

it with a regular expression and then

play06:30

use the match group to uh as input to a

play06:34

shot 256 function and then replace the

play06:36

attribute on there um and the cool thing

play06:38

about this is that that shot 256

play06:40

function is actually like an OTL

play06:41

function it's not hardcoded into the

play06:44

replace pattern function at all you

play06:45

could replace it with whatever hashing

play06:47

uh algorithm that you wanted and

play06:49

additionally that whole function

play06:50

argument is optional so if you didn't

play06:52

pass that in it would just use the

play06:53

capture group without uh hashing it at

play06:55

all and um but if you you know obviously

play06:58

want it you can have it there too

play07:00

um so with all the data reacted the data

play07:02

can now safely leave the region and we

play07:04

can move on to the second collector here

play07:06

which has the pipelines that are covered

play07:09

uh are highlighted in this diagram um so

play07:12

we're going to route and then we're

play07:12

going to sample uh let's start with

play07:14

routing and then after we route if it

play07:17

goes to the retail pipeline no more

play07:19

processing is needed that team will

play07:20

handle it so we can quickly cover that

play07:23

uh we can do that with the routing

play07:24

connector so the routing connector also

play07:26

supports OTL and we have this pretty

play07:29

pretty easy all of our applications are

play07:31

already annotated with what branch of

play07:33

the company they apply to so we can just

play07:36

check if it's anything that starts with

play07:37

retail goes to the retail back end we're

play07:39

done anything else goes to the

play07:40

companywide back end and with that we're

play07:42

done with the retail pipeline so we can

play07:44

move on to uh our companywide pipeline

play07:48

that is mostly for industrial telescopes

play07:51

um and again we need to do sampling

play07:53

there so we're going to use this tail

play07:56

sampling policy and this checks does a

play07:59

handful of uh checks using OTL before

play08:03

determining whether to sample it or not

play08:05

uh if any of these conditions matches

play08:07

then the trace is sampled if none of

play08:10

them match then it's dropped so first we

play08:12

want to make sure that if there's an

play08:14

error we know about it so we're going to

play08:15

sample all errors uh then we really like

play08:19

to be paid so anything from the payment

play08:21

service is definitely sampled uh and

play08:24

everything else we do a pretty

play08:26

rudimentary uh sampling algorithm where

play08:29

we get the heximal representation of our

play08:31

random Trace ID and then check if it

play08:34

starts with one of the 16 heximal

play08:36

characters in this case a um and that

play08:39

gives us a one in 16 chance of data

play08:41

being sampled so with this we've cut

play08:44

down the data that we have uh we're good

play08:45

to go the data is then forwarded to the

play08:47

company wi back end and this is the

play08:49

resting config you can see here that

play08:51

it's pretty simple everything's

play08:53

configured uh using like the same kind

play08:55

of configuration format it's all ootl I

play08:58

do want to call out that

play09:00

this is not a production ready config

play09:02

there's a couple of like recommended

play09:04

options that I've left out of here just

play09:06

to make the config a bit shorter but

play09:08

this should hopefully give you an idea

play09:09

that you can use a variety of different

play09:10

components and they're all kind of

play09:12

configured in the same sort of way and

play09:15

that you can more or less kind of tweak

play09:17

and query how you

play09:19

like uh finally I do want to cover some

play09:22

some new features here that we've added

play09:24

recently uh first the one I or two these

play09:27

first two I've covered uh earlier uh the

play09:29

first one's optional parameters so for

play09:32

example for this pars key value function

play09:35

uh we have a default

play09:36

delimiter and users might want to

play09:39

override that so they're given the

play09:40

option to set their own delimiter if

play09:42

they want similarly with functions as

play09:44

parameters you can pass in a function if

play09:46

uh your function accepts that if it

play09:48

matches the function signature uh not

play09:50

common but useful for complex use cases

play09:53

and then finally we've added 15 new

play09:56

functions so far this year and we're

play09:58

continually adding more so if there's

play10:00

functional that you feel like was

play10:01

missing prior uh check back because it

play10:04

might be there now going forward uh

play10:06

looking a little bit ahead we're going

play10:08

to be looking at how to handle list

play10:09

values that's something that is kind of

play10:11

a bit of a gap right now we'd like to

play10:13

improve uh and then we're going to be

play10:14

looking at trying to stabilize HL in the

play10:16

transform processor uh looking ahead to

play10:19

hopefully consolidating uh basically

play10:21

that list of processors possibly a few

play10:22

more um into the transform processor

play10:25

just make it easier for users to

play10:27

determine which processor to do

play10:30

um and here are some dock links if you

play10:32

want to scan or type those in but um

play10:35

I'll leave this up but we're done I

play10:36

think we're ready for questions

play10:44

now right um okay so that is I'm

play10:47

definitely not the expert on this but

play10:49

there are um persist there are

play10:52

extensions that allow you to persist

play10:53

data in the pipelines on disk or on a

play10:57

store like S3 or something like that so

play10:59

that could be one option second you

play11:01

would also want in your pipeline some

play11:02

retry logic so so you have a collector

play11:05

crash you know it's got some data in its

play11:06

pipeline but you've persisted that it's

play11:08

safe you're going to be still sending

play11:09

data to it you'll want retries for when

play11:11

the collector comes back

play11:13

up so okay uh that that's a tough

play11:17

question and I'm definitely not

play11:18

qualified to answer it but my

play11:20

understanding of how that would work is

play11:21

I would probably recommend sharting you

play11:23

check the trace ID and then you use that

play11:25

to route to a particular collector I I

play11:29

dressy is that sound like a good

play11:34

okay cool

play11:37

okay

play11:39

thanks um so for here was the

play11:41

flexibility but I would recommend to do

play11:42

it as close to source as possible so the

play11:44

reason that I had this in the example

play11:46

here um and I'm sorry I forgot to repeat

play11:47

the question the question is is there a

play11:49

reason that you would want to Hash um

play11:52

pii in the collectors opposed to as

play11:54

close the application as possible um and

play11:56

again I would do definitely do that as

play11:58

close as possible

play12:00

and the um yeah there's no reason to do

play12:04

it any further but the reason I did it

play12:05

here was because if you were in the same

play12:07

re usually Pi you don't want to leave

play12:09

the region right so if you put a

play12:11

collector as close as your application

play12:12

as possible you can be sure it won't

play12:14

leave the region

play12:15

unredacted a good question um I'm happy

play12:17

you asked actually because I didn't get

play12:19

or I didn't take time to call it out um

play12:21

so the question is is there any good way

play12:22

to test OTL um and the answer is yes

play12:24

Tyler actually just added some debug

play12:26

logging so if you turn on debug logging

play12:28

The Collector it'll print out debug logs

play12:30

that show the state of the data before

play12:33

and after an execution uh we're also and

play12:36

this is being reviewed right now I think

play12:37

it's actually being pretty it's pretty

play12:38

close to being merged uh we're adding

play12:40

traces as well to the transform

play12:41

processor

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
ObservabilityOTel CollectorOTL LanguageData ProcessingMiddlewareTelemetryFilteringRoutingSamplingCase StudyTech Talk