GopherCon 2020: Ted Young - The Fundamentals of OpenTelemetry
Summary
TLDRIn this informative talk, Ted Young introduces Open Telemetry, an observability platform for monitoring distributed systems. He explains the concept of telemetry, Open Telemetry's extensible components, and its role in emitting signals like distributed tracing and metrics. Young demonstrates how to instrument code with Open Telemetry, emphasizing context propagation for efficient tracing. He also discusses the importance of standardization in the telemetry ecosystem and provides practical examples and resources for getting started with Open Telemetry in various programming languages.
Takeaways
- π Open Telemetry is defined as an observability platform with extensible components for monitoring distributed systems.
- π It unifies signals like distributed tracing, metrics, and system resources, providing the context needed to correlate them.
- π οΈ Open Telemetry includes a data processing facility for data format transformation, manipulation, and distribution to multiple consumers.
- π The Open Telemetry SDK is installed in every service of a deployment, implementing the Open Telemetry API for instrumentation.
- π Open Telemetry Collector is a data pipelining service that can translate between various formats like OTLP, Zipkin, Jaeger, and Prometheus.
- π Open Telemetry focuses on standardization for describing distributed systems in cloud environments, rather than standardizing data analysis tools.
- π It is designed via specification, a language-neutral document that allows building consistent implementations across different software ecosystems.
- π§ Open Telemetry can be easily installed and configured with minimal code or command line arguments, supporting languages like Java, JavaScript, Python, and Go.
- π Context propagation is central to Open Telemetry's architecture, allowing the flow of execution and metadata across services in a transaction.
- π The use of semantic conventions in Open Telemetry helps standardize the description of system components for better data analysis and understanding.
- π Baggage headers in Open Telemetry allow for the propagation of arbitrary key-value pairs, useful for passing correlations without additional system load.
Q & A
What is the definition of telemetry according to the Cambridge Dictionary?
-Telemetry is defined as the science or process of collecting information about objects that are far away and sending the information somewhere electronically.
What is Open Telemetry and what does it aim to achieve?
-Open Telemetry is an observability platform consisting of extensible components that can be used together or apart. It aims to standardize the language for describing how distributed computers operate in a cloud environment, allowing for better observability and analysis tools without the need to reinvent the telemetry ecosystem.
What are the three main types of signals emitted by Open Telemetry?
-Open Telemetry emits distributed tracing, metrics, and system resources as its main types of signals.
What is the role of the Open Telemetry SDK in a service?
-The Open Telemetry SDK, referred to as the client, implements the Open Telemetry API. It allows applications, frameworks, and libraries to use this instrumentation API to describe the work they are doing and then sends the data to a data pipelining service called the collector.
What is the purpose of the Collector in Open Telemetry?
-The Collector in Open Telemetry is a data pipelining service that receives data from the SDK and can translate between various data formats, including OTLP, Zipkin, Jaeger, and Prometheus.
Why does Open Telemetry not provide its own backend or analysis tool?
-Open Telemetry does not provide its own backend or analysis tool because its primary focus is on standardization efforts for describing distributed systems in cloud environments, rather than standardizing data analysis methods.
How does Open Telemetry ensure consistency and interoperability across different implementations?
-Open Telemetry is designed via a specification, which is a language-neutral document that describes everything needed to build an implementation of Open Telemetry, ensuring consistency and interoperability.
What is context propagation in Open Telemetry and why is it important?
-Context propagation is the core concept behind Open Telemetry's architecture. It involves sending the contents of the context object as metadata on network requests, allowing the flow of execution and key-value pairs to be tracked across services, which is essential for distributed tracing.
What are the primary HTTP headers used for trace context in Open Telemetry?
-The primary HTTP headers used for trace context in Open Telemetry are 'traceparent' and 'tracestate', which contain information about the trace and span IDs, as well as any additional implementation-specific details.
How can baggage headers be used in Open Telemetry to improve observability?
-Baggage headers in Open Telemetry allow users to pass arbitrary key-value pairs that can be used for correlation purposes. They can be propagated along with the context and used to index spans with additional metadata, such as project IDs, to identify usage patterns or troubleshoot issues.
What are semantic conventions in Open Telemetry and why are they important?
-Semantic conventions in Open Telemetry are standard resources and trace attributes used to describe a system. They are important because they help analysis tools understand the information by providing a standardized way of reporting data, such as hostname, operating system, and other system characteristics.
Outlines
π Introduction to Open Telemetry
Ted Young introduces the concept of Open Telemetry, an observability platform designed to collect and process signals from distributed systems. He explains that Open Telemetry is not just about data generation but also provides a data processing facility. The platform is extensible and can be used with various data formats and protocols, including its own OTLP. The goal of Open Telemetry is to standardize the language for describing cloud-based distributed systems, allowing for the easy development of new analysis tools. The script also covers the installation of the Open Telemetry SDK and the use of the Collector for data pipelining, emphasizing the ease of setup and the support for multiple programming languages.
π Observing and Analyzing System Transactions
This paragraph delves into the observability of system transactions, using a classic LAMP stack example. It discusses the importance of understanding latency, errors, and the sequence of operations within a transaction. The call graph is introduced as a way to visualize the time spent in each operation and the network calls between services. The paragraph highlights the need for context propagation in distributed tracing to correlate logs and identify patterns in latency and error rates. It also touches on the challenges of scaling and the importance of indexing logs with a single transaction ID for efficient debugging.
π Context Propagation in Open Telemetry
The core concept of context propagation in Open Telemetry is explained, which is essential for tracing and understanding the flow of transactions across services. The paragraph details how context is propagated through network requests by sending metadata in HTTP headers, following agreed-upon standards like the W3C's trace context. It also introduces baggage headers, which allow for the transmission of arbitrary key-value pairs for user-defined correlations. The paragraph concludes with a practical example of setting up an Open Telemetry HTTP server, including the configuration of service names, access tokens, propagators, and resources.
π οΈ Implementing Open Telemetry in Go
The paragraph demonstrates the implementation of Open Telemetry in a Go programming environment. It covers the setup of an HTTP server with Open Telemetry's instrumentation, including the creation of a tracer, the addition of a simple handler, and the use of semantic conventions to standardize service indexing. The script also shows how to add custom attributes to spans and how to create child spans and events for more detailed tracing. The importance of ending spans to avoid leaks and recording errors and events is emphasized, along with the use of debug logs for troubleshooting.
π Context Propagation and Baggage in Client Requests
This section illustrates the use of Open Telemetry in an HTTP client, showing how context propagation works across network requests. It explains the process of wrapping the HTTP client's transport with Open Telemetry's instrumentation to enable tracing. The paragraph also introduces the concept of baggage, which allows for the propagation of additional data, such as a project ID, from the client to the server to avoid unnecessary database calls. The script includes code examples for creating an HTTP client, making requests, and extracting baggage values to enhance trace information.
π Advanced Tracing Techniques and Rollout Strategy
The final paragraph discusses advanced tracing techniques, such as creating a master span to group multiple HTTP requests into a single trace. It also addresses the practical considerations for rolling out Open Telemetry within an organization, emphasizing the importance of choosing the right languages and gaining internal support. The paragraph suggests starting with a high-value transaction to demonstrate the benefits of Open Telemetry and then expanding to other areas. It concludes with resources for further learning, including a documentation site and community engagement through GitHub and social media.
Mindmap
Keywords
π‘Open Telemetry
π‘Telemetry
π‘Distributed Tracing
π‘Metrics
π‘Observability
π‘Context Propagation
π‘Trace Context
π‘Baggage
π‘Instrumentation
π‘Semantic Conventions
π‘Collector
Highlights
Open Telemetry is an observability platform for collecting signals from distributed systems.
It provides context to correlate across distributed tracing, metrics, and system resources.
Open Telemetry includes a data processing facility for data format changes and manipulation.
The Open Telemetry SDK is used for instrumenting applications and frameworks.
OTLP is Open Telemetry's own data protocol, but it can also translate between other formats like Zipkin and Prometheus.
Open Telemetry does not provide its own backend or analysis tool, focusing on standardization for cloud environments.
The project is designed via specification to ensure consistency and interoperability across implementations.
Open Telemetry can be easily installed and configured with minimal code or command line arguments.
Languages recommended for production-ready beta include Java, JavaScript, Python, and Go.
Observability concepts include understanding transactions in a distributed system with examples like a mobile client uploading a photo.
Call graphs represent operations and network calls to provide insights into transaction latencies.
Errors in transactions can be identified and debugged using Open Telemetry's tracing system.
Context propagation is central to Open Telemetry's architecture, allowing for powerful indexing of transactions.
Tracing headers like trace context and baggage headers facilitate context propagation across services.
Open Telemetry's API allows for the creation of spans, setting attributes, recording errors, and adding events.
Demonstration of setting up an Open Telemetry HTTP server and client with context propagation.
Using baggage to propagate additional data like project IDs can enhance tracing without additional server-side calls.
Traces provide a more efficient way to investigate transactions compared to traditional logging.
Strategies for rolling out Open Telemetry in an organization, including choosing production-ready languages and getting organizational buy-in.
Opentelemetry.lightstep.com offers resources, guides, and documentation for getting started with Open Telemetry.
Transcripts
oh hey
my name's ted young and my pandemic
haircut is a hat
fundamentals of open telemetry
but what even is open telemetry come to
think of it
what's telemetry the cambridge
dictionary defines telemetry as
the science or process of collecting
information about objects that are far
away
and sending the information somewhere
electronically open telemetry is an
observability platform
a set of extensible components that can
be used together or a
cart open telemetry emits a variety of
signals
distributed tracing metrics and system
resources being the most important
rather than keep these signals separate
open telemetry braids them together and
provides the context you need
to correlate across them in your back
end in addition to data generation open
telemetry provides a data processing
facility
this allows you to change data formats
manipulate your data
scrub it tee it off to multiple
consumers
everything you would need in a modern
robust telemetry pipeline for your
distributed system
in every service in your deployment
install the open telemetry client
we refer to the client as the open
telemetry sdk
the sdk in turn implements the open
telemetry api
your applications frameworks and
libraries use this instrumentation api
to describe the work that they are doing
the sdk then uses an
exporter plugin to send the data to a
data pipelining service called the
collector
open telemetry comes with its own data
protocol called
otlp but the collector can translate
between a variety of formats
including zipkin jaeger and prometheus
notably open telemetry does not provide
its own back end or analysis tool
this is because at the heart of open
telemetry is a standardization effort
the goal is to come up with a universal
language for describing how
distributed computers operate in a cloud
environment
the goal is not to standardize how we
analyze this data
instead open telemetry hopes to push the
field of observability
forwards by allowing new analysis tools
to be built quickly and easily
without the need to reinvent this entire
telemetry ecosystem
speaking of software ecosystems how does
open telemetry keep track of all of this
code
to ensure that different implementations
remain consistent with each other
and continue to interoperate open
telemetry is designed via specification
this specification is a language neutral
document which describes
everything you would need to build your
own implementation of open telemetry
before we dive into the details i do
want to point out that it is easy to
install
open telemetry open telemetry can be
packaged up into distros that make
the configuration and installation only
a few lines of code or in some cases
just a command line argument
at the time of this recording i
recommend four languages for production
ready beta
java javascript python and of course go
and i've written some easy quick start
guides over at otel.lightstep.com
you can always check there to get the
latest information about production
ready open telemetry
okay so in this next section we're going
to do a quick
overview of the basic concepts behind
open telemetry
we're going to start with what it is
that we're actually trying to observe
fundamental concepts to how open
telemetry approaches observability
and how to set up and deploy open
telemetry
in your production environment
okay so let's look at an example
application to get an understanding of
the kind of transactions we're talking
about here
let's say you have a mobile client that
wants to let you
upload a photo with a caption so that
client is going to connect to a server
but of course it's not going to be one
server it's going to be a bunch of
servers
let's say the first thing that it hits
is a reverse proxy
and then that reverse proxy wants to
authenticate you so it calls out to an
authentication service
once that comes back as aok it then
uploads your image to a scratch disk
once the image is successfully uploaded
it calls out
to an application with the location of
that image
that application then uploads the image
to
cloud storage and then stores the
location of
that image and the caption in a sql
database via a data service
that then you know holds the cache to
that information
in redis why is this application built
like this
who knows someone said build it this way
and someone else said okay
but seriously i feel like i've been
looking at applications
basically like this one for about 20
years
uh this is your classic lamp stack sort
of setup
and honestly there were just as annoying
to observe back then
as they are today so this view here
represents the transaction
as a service diagram while this is a
useful display for getting an overall
sense of which
services were involved in a transaction
it doesn't tell you a lot of details
such as how long the transaction took
where all that time went
and what order were operations called in
to get a deeper understanding of this
transaction
let's represent it as a call graph
in this diagram each operation is
represented
by a line and the length of the line
represents the amount of time
spent in that operation the operations
are then connected together via network
calls represented by the arrows
so here we can see the client span
talking to the reverse proxy talking to
the auth server
so on and so forth as an operator
there's a number of things we care about
when we look at a diagram like this
first and foremost we care about latency
we want to understand where the time was
spent in the system
why is it slow can be such a cranky
question to answer
for example the client span in this case
took the most amount of time
but that doesn't tell you very much
instead you want to find out
where the system was doing work and
where the system was
waiting on work to be done the
combination there will tell you
where you need to focus your efforts if
you're actually going to improve the
latency in this system
in this case the most amount of time was
spent uploading
the file to scratch disk and then
uploading it again to cloud storage
so if you tried to optimize anything
else you really wouldn't be moving the
needle very far
because the overall amount of time you
would be affecting would be minimal
next of course we care about errors when
a transaction is failing we want to be
able to quickly identify
which service actually had the error and
which services were simply propagating
an error downstream
or responding to that error once you've
identified an error
you're going to want to debug it and in
order to do that you're going to want
more fine-grained detail
in open telemetry's tracing system we
call these events
but you can think of them as logs
because that's basically what they are
however there is one major difference
between tracing and logging
with tracing every operation can be
associated with a set of key value pairs
that allow you to identify patterns in
your latency and error rates
being able to quickly or automatically
identify that
certain errors are associated with
certain routes or regions or hosts
certain latency patterns are associated
with certain clients or certain project
ids
is open telemetry's killer feature
while the transaction we've been looking
at is simple enough
the real issue here is scale as your
system grows and grows
the percentage of logs associated with
any particular transaction
shrinks and shrinks and it becomes more
and more difficult to
find the logs that are associated with a
particular transaction
for example we know that the reverse
proxy talks to an application
server but what if there are 50
application servers
how do you know where to look to find
those logs
obviously you're going to need to index
these logs but index them with one
the ideal index would be a single
transaction id
that was stapled to every log in the
transaction
so that if you found one log you would
quickly be able to search for all the
other logs associated with it
and that in a nutshell is distributed
tracing
so how does open telemetry provide all
of this awesome indexing
the answer is context propagation
context propagation is the
core concept behind open telemetry's
architecture
if you can understand context
propagation then everything else about
open telemetry will fall into place
so imagine we have two servers and
they're connected together
by a network request all of open
telemetry's
indices are stored in an object called
the context
this context object follows the flow of
execution
throughout your program when your
transaction
moves from one service to the next via a
network call
all of these key value pairs must come
along for the ride
sending along the contents of the
context object as metadata on the
request
is called propagation when using http
the contents of the context object on
the client side
are injected into the http request
as http headers then
on the server side the same values are
extracted from the headers
and deserialized into a new context
object
which continues to follow the flow of
execution
now obviously this only works if both
the client
and the server agree on which http
headers are going to be used
to make this more effective we're
working through the w3c
to add a set of tracing headers to the
official http spec
there are a number of tracing headers
out there in the wild but let's have a
look at these
since they're going to be the standard
going forwards
the primary tracing headers are called
trace context
trace context consists of two headers
trace parent and trace state
trace parent contains two ids
one represents the overall transaction
that's called the trace id
the other id represents the parent
operation
which is called the span id trace parent
also includes a sampling flag
to let you know whether tracing is
enabled or not
the trace state header contains any
additional implementation specific
details
that a particular tracing system might
need to propagate
in addition to trace context there are
also baggage headers baggage headers
are literally arbitrary key value pairs
that you can use as an end user to pass
your own correlations down the line
we'll see how that's useful in a bit
okay so enough talk
let's write some code for this example
we're going to make a simple
hello world http server let's get
started by first installing open
telemetry
now an important thing to understand
about open telemetry is that it's a
framework
and most of the configuration you do is
about connecting it to different
backends
but once you've picked the back end you
want to connect to most of that
configuration becomes boilerplate
to help with that we've created a
concept called open telemetry distros
since we're going to be connecting the
light step in this example let's grab
the light step distro
the first required piece of
configuration is the service name
this lets you know where all the data is
going to be originating from
let's call it hello server then in order
to connect to light step
you're going to need an access token
so that's just a little doodad we're
going to grab
from light step and paste that crazy
thing into here
and then to show some optional
configuration
first let's define which propagators
we're going to use
so we talked about trace context before
but for this example let's switch to b3
which are the zipkin
headers you may encounter these if
you're already using
a tracing system the final bit of
configuration we want to point out are
resources resources are what you use to
index your services
so the same way traces have indices and
operations of indices
services can also have indices in open
telemetry we have this concept called
semantic conventions
semantic conventions are standard
resources and trace attributes that you
can add to describe your system
these conventions are defined in the
specification
uh we have everything that you might
expect operating systems
containers processes
let's grab hostname and add that
so the reason why it's important to
standardize these conventions
is so that your analysis tools can
actually understand the information
if we reported hostname in some cases as
host.name
and in other cases as host dash name or
just host
it would be much harder to do something
useful with that data
lastly you need to add a call to
shutdown at the end
to ensure that everything gets flushed
when your program
exits and that's all the setup we need
to do
from here on out we'll only be
interacting with the open telemetry api
the first thing we'll want to do is
create a tracer
tracers should be created at the package
level
and named after the package that they're
instrumenting
this allows you to attribute every span
created
to the package that created it next up
we're going to create a simple hello
world handler
so all this handler is going to do is
sleep for 30 milliseconds to pretend
like it's doing work
and then write out hello world
super basic next up we need to install
instrumentation libraries in most
languages
these libraries can be installed
automatically but in go
we don't really like any of that spooky
automatic stuff
instead we prefer to copy paste open
telemetry comes with a variety of
instrumentation libraries
including those that cover all of the
core
http and networking libraries within
standard lib
as well as a number of common frameworks
so in this case we're taking our http
handler
wrapping it in an instrumented http
handler
and adding that to our service
and that's all it takes to add basic
instrumentation and context propagation
to an http server and go so
let's have a look at this let's fire it
up
okay so let's start our http server
and then hop into our browser hit
refresh a whole bunch of times on this
endpoint to ensure that we generate some
data
and then go look in our back end to see
if we're getting anything and sure
enough
we've got spans coming in and if you
click into one of these you'll see the
world's most simplest trace
it's a single span but look at how rich
that span
is with indices and data those are the
semantic conventions i was referring to
earlier
okay now let's add some of our own data
the first thing we need to do
is get a handle on the current span to
do that
extract it from the context this is the
same span
that was set up by our server
instrumentation earlier
let's add an attribute describing the
route for this endpoint
using one of open telemetry's semantic
conventions
in general you should prefer decorating
these existing spans with attributes and
events
rather than creating child spans okay
let's run our server again and see how
it looks
this time i'm going to enable the debug
logs
these are useful for diagnosing a
configuration error
so i just wanted to point them out okay
so let's generate some data again
and go have a look and see what we've
got
we should be able to see that the spans
coming in now
have an http route key associated with
them
and sure enough there it is so this is
really useful
because you could now uh query by http
route key and make apples and to apple's
comparisons
across different latencies for the same
route
okay so let's try creating our own child
span to do that
take the tracer that we created for our
package and use it to call start on the
current context
along with a name for our new operation
this will return
a new context with our new child span
set as active within it
remember to always end your spans so
that you can correctly record the
latency and avoid creating a leak
if an error occurs you can call record
error to log it as an event if this
error indicates that the entire
operation is an
error you also need to change the span
status to error
otherwise this error will just be
recorded as an event
you can also add regular events events
are effectively really awesome
structured logging
each event has a time stamp a message
a set of key value pairs and of course
it's contextualized by the span
and the trace that it's occurring inside
of
okay so make sure to pass in that
context and we're good to go
so let's start our server up again and
generate some more data
we should be seeing errors show up in
the explorer now and sure enough there
they are
so if you click into one of these you'll
see that
the trace is now a little bit more
complicated we've got two spans
the parent span generated by the
automatic instrumentation
and the child span that we created
ourselves and you can see that the child
span
has been marked as an error and contains
the event that we added you can see that
listed over in the corner
you may notice that the child span has
the same duration as the parent span
this implies that all the work is being
done in the child span
and if you look at our code sure enough
that's where the sleep is
so if we add another sleep to imply
we're doing work in the parent span
we'll get a more interesting trace so
just hitting refresh here
sending our data again
and we see some new data coming in and
if you have a look here you can now see
that roughly half of the work is being
done in the parent span and half of the
work is being done
in the child span so we've covered
six basic commands getting the current
span from the context
setting an attribute creating a child
span
recording an error setting a status and
adding an event
and that's all you need to know about
the open telemetry api
as an end user okay but this is supposed
to be distributed tracing
let's create an http client and watch
that context propagation
flow through our system first i'm going
to copy pasta over
all of our open telemetry setup code i'm
not going to set up resources this time
just for expediency but you really
should do that in a production scenario
this is going to be a very simple client
all it's going to do
is spin up make five http requests in a
row
and then shut back down okay so first
we're going to create an http client
and then create an http request
handle the error we're just going to
panic in this case
no big deal then uh
do the request on the client
again panic with the error and then
close the body of the response
just to make sure we do this cleanly and
then in our main
we're going to make a for loop then make
five requests in a row and that's that
to instrument our client and enable
context propagation
we wrap our http clients transport in an
open telemetry
instrumentation library okay
so let's get this client running we're
just going to run it a number of times
then look in our explorer again and see
if those spans are showing up
and sure enough we now see a more
complicated trace
in addition to those server side spans
we saw before
we can see they now have a client span
uh
set up as their parent and again notice
that this client span is decorated with
a bunch of http information
in addition to that uh there's also an
attribute called instrumentation
name and this points to the origins of
this particular span so in this case we
can see it came from the otel http
library
that's useful information if you're
trying to debug your tracing
for example i can click through on
instrumentation name and do a query
and find all the spans that are being
generated by this particular
instrumentation package across my system
now that we have context propagation
flowing
let's add some baggage so baggage is a
really cool feature
let's imagine that we have a project id
that's available on our client but would
be expensive to grab on the server
perhaps it would be an extra database
call that we don't want to make
so rather than doing that we can flow
our project id from the client to the
server using baggage
so to do that i'm grabbing our context
object
i'm attaching the baggage to the context
object
using context with baggage values
then flowing that context into our
request
then on the server side i'm grabbing our
context
and pulling the baggage off of it
so to do that we're going to take our
context object
and call a baggage value on it
and extract the project id
we can then take that project id and use
it as a correlation by setting it as a
span
attribute it's also very useful to use
these correlations with metrics
though we're not getting into metrics in
this talk
okay so let's reboot our server
and have a look at this new data
oh running the client again you'll get
used to that
and looking in the explorer and
clicking on one of our internal spans
and sure enough there's project id
and if we were to add more services
downstream of this one
then that baggage would continue to
propagate indexing your spans with
concepts such as project id can be very
useful
because they identify potential usage
patterns
for example it would be an important
insight to understand that the errors
and latency issues you are seeing in
your system were actually correlated
with just a handful of accounts
one cautionary note baggage values
aren't free
as they have to be propagated downstream
each value you add
increases the size of every http request
so you should use them sparingly okay
so let's make this trace a little more
interesting we're going to go back into
our
http client and connect all of these
requests together
into a single trace so the first thing
we're going to do
is flow our context all the way through
make request
and then we're going to create a master
span
in order to create a span we're going to
need to create a tracer handle
so we do that at the package level and
then name the tracer after the package
so that we'll know where this span
originated from
once we've done that we start the span
name it whatever something something
make sure we end the span that's always
a gotcha you don't want to leak
and then let's have a look we should now
see a single trace with five http
requests
attached to it and sure enough
there it is so i hope this shows if
anything
what a time saver tracing can be over
regular logging
you notice at no point during this
demonstration
did i go in and start filtering by
request id
and then finding some other id and
adding that to the filter and
building up a query just in order to get
a view of the transaction
i started with a view of the transaction
and from there pivoted into
investigating various ways this
transaction relates to other
transactions
and being able to just move naturally
into that workflow without having to
slow yourself down and
build up these queries all the time has
proven invaluable
especially at the beginning of an
incident when i'm trying to cast the
wide net
and diagnose the problem or when there
are no active problems but i want to
proactively investigate latency issues
and improve the overall user experience
for my application
okay we're getting close to the end and
before we go i'd like to do a quick
review
of rolling out open telemetry if you're
thinking about rolling out open
telemetry and organization
the first thing to double check is that
the languages that you're using are
ready for production
as i mentioned before at this time i
recommend java javascript python and go
erlang is also getting ready there's a
number of them getting ready
the best way to find out the current
state of any open telemetry project
is to ask the special interest group
that's working on it you can find them
by checking out the github repo
for each one of these projects or going
to open telemetry io because that links
out to all of that including calendars
for meetings
getter rooms everything you need in
order to get involved in the community
since we're in beta i really recommend
doing that
the next thing you need to do is get
buying within your organization
i don't recommend trying to boil the
ocean especially if you have a number of
service teams that you're going to have
to go to and ask to do the work of
setting up distributed tracing
if installing open telemetry everywhere
is looking like it might be a lot of
work
the best thing to do is to pick a
particular pain point
find one transaction that has high value
that you'd like to understand
say the latency or error rates about and
implement all of the services necessary
to understand that high value
transaction
once you've got that one transaction
implemented from start to finish
you'll be able to really see open
telemetry work
with everything installed properly and
all the data being correct
rather than sort of a scatter shot
approach where maybe one team installs
it and then another team installs it but
it's not part of an organized effort
once you've instrumented that particular
transaction
you can expand out from there look for
outliers another low-hanging fruit
if you're looking for more information
about getting started i'm putting a
documentation site together called
opentelemetry.lightstep.com
this is where you'll find getting
started guides guides to our own distros
the open telemetry launchers
and coming soon you'll be seeing
cookbooks deep dives etc
i really want to make it an excellent
resource for anyone
trying to use open telemetry if you'd
like to keep track of all of this i post
regular updates on twitter
at tetsuo so you can follow me there or
send me a dm
thanks for watching and i hope you get
involved in open telemetry
Browse More Related Video
How OpenTelemetry Helps Generative AI - Phillip Carter, Honeycomb
What Could Go Wrong with a GraphQL Query and Can OpenTelemetry Help? - Budhaditya Bhattacharya, Tyk
OpenTelemetry for Mobile Apps: Challenges and Opportunities in Data Mob... Andrew Tunall & Hanson Ho
Telemetry Over Events: Developer-Friendly Instrumentation at American... Ace Ellett & Kylan Johnson
Using Native OpenTelemetry Instrumentation to Make Client Libraries Better - Liudmila Molkova
Welcome + Opening Remarks - Austin Parker, OTel Community Day Event Chair
5.0 / 5 (0 votes)