Sitemap

Distributed Context Propagation: How you can use it to Improve Observability, Test in Production, and More…

7 min readOct 16, 2022

Introduction

In a complex distributed system, one logical operation/request is handled many participants. In an earlier post (Distributed Tracing: The Why, What, and How?), I shared why Observability is a challenge in a system and how Distributed Tracing can help there.

However, there’s one more challenge in such a system: How can all the participants know the context of a request? In this post, let’s build together in a bottom-up manner the building blocks for solving this problem, and then extend that solution to be open and interoperable.

Ready to go on this journey? Let’s first look at a scenario where we have the need for this shared context.

The need for distributed request context

Let’s say you have a distributed system. A request, say a customer placing an order, flows through many participating services. Things are working great!

However, one day you notice that a service in the lower layer of your system can get into an overloaded state under certain conditions.

Situation 1: Do traffic prioritization

To address it, you want it to be able to prioritize requests. For example, a request to place an order is more important than a request to update a profile picture. If that lower layer service can know the request priority, when in an overloaded state, it can drop lower priority requests and serve the higher priority requests.

Need for request prioritization / QoS

To do this, we need to have a property called “request priority”. When the request starts, this needs to be set at the originating service and it needs to be propagated to downstream services.

Now, how do we propagate this information? One possible way: What if we change the API signatures?

Take 1: Changing APIs to include new information

To achieve the above goal, you change the API signatures for each service to include a request priority parameter. You then change each service to pass this value to downstream services. This involves making changes and negotiating/coordinating across different services and it is challenging but you finish it successfully. Now, the service in the lower layer is able to understand the priority of the request and make prioritization decisions.

For a lower-layer service to get this, you would need to negotiate with the owners of all the previous layers to pass the request priority parameter…

Hooray! Things are working fine — you have addressed this requirement.

A few months go by. One fine day, you get this new requirement to handle:

Situation 2: Attribute infrastructure spend to Line of Business

Your company has multiple lines of business, and you have shared services (e.g., Storage) in a lower layer. You have been asked to attribute infrastructure spend to each line of business.

Oh no — changing the API signatures again across those multiple layers of services to include this information is not practical. In a few months you could be getting yet another such requirement — and we can’t keep changing API signatures for ever…

What if there’s a better alternative?

Take 2: Introducing a custom header

What if we can propagate this information through a header (e.g., a HTTP header)? That way, you do the work once of creating and propagating this header, but you can add any new parameters easily. Wouldn’t that make life a lot easier? So, you decide to introduce a new HTTP header and define the key-value pairs in it:

Using a custom header and changing the services to propagate this header

Now, each participant can look at this header and understand the shared context. You also need to change each participant to propagate this header to other participants. It is non-trivial amount of work, but now you have successfully met both the above requirements. Even if you receive a new requirement in the future, it will be a breeze — you can just include that new key and value in your above header. Isn’t that great?

Well, a few months later, you start integrating with an external service from a different vendor as well as with a few open-source frameworks. You soon realize that a system external to you doesn’t understand this header. Worse, it is dropping this header and not propagating it further!

An external system or OSS component doesn’t understand your custom header and it doesn’t propagate it either.

What’s happening here? So far, we tried two approaches to propagate user-defined properties of a distributed request. But, without interoperability, it stopped working when other participants didn’t understand our header.

What if we had a standard and interoperable way to represent and propagate this distributed request context?

Take 3: Using W3C Baggage header

Why did the distributed context propagation stop? Our operation was crossing boundaries to services provided by other vendors. But we hadn’t agreed upon a common mechanism for the above header. Hence, even if they had support for such distributed request propagation using a different format, we still wouldn’t have been able to speak a common language.

The one additional thing we need is interoperability. We need a mechanism that everybody can agree to use. In other words, we need a standard. Thanks to the efforts of many contributors across the industry, we are moving towards just that goal: W3C Baggage. As of Oct 2022, it is in a state of First Public Working Draft, and it is expected to get to Recommendation stage in a few months.

What does the W3C Baggage define?

So, what is Baggage? The spec defines a standard HTTP header to propagate the application-defined properties associated with a distributed request. For example:

baggage: lob=business1, priority=3, isProduction=false

A framework can automatically propagate this data to downstream services. The cross-cutting nature of this context propagation means that it doesn’t require any changes in the participating services — no need to change any API signatures. Each participant can retrieve the properties as well as add new properties (key-value pairs) to the baggage to share context with downstream consumers.

Using the Baggage header and frameworks such as OpenTelemetry have support for propagating it.

A few more use cases of Baggage

We saw two use cases above. Generalizing it a bit more, you can think of two main categories of use cases of baggage:

  1. Use cases that enable better observability of the system.
  2. Use cases that enable better control of the system.

The below are a few more use cases in the above categories. Credit for all these use cases: Yuri Shkuro who wrote a great article about this at Embracing context propagation.

Labeling synthetic traffic
Let’s say you want to configure different alert thresholds for production vs. synthetics data. To do this, you want to partition your system’s error rate metrics to have separate time series for production and synthetics. You can use Baggage to tag synthetic requests: e.g., isProduction=false so that participants can distinguish production vs. synthetics.

Using Baggage to build different alert thresholds for synthetic vs production traffic.

Carrying fault injection instructions for chaos engineering
Baggage can be used to deliver fault injection instructions for chaos engineering. For example, an instruction could be to increase the latency of a specific service call or to fail a call.

OK, how do I try this out?

OpenTelemetry includes Baggage as a signal. It provides an implementation of the W3C Baggage draft specification. You can use it to add properties to baggage and OpenTelemetry takes care of propagating this distributed context. For example, here is OpenTelemetry.NET’s API for Baggage.

How is Baggage different from Trace Context?

Baggage is independent of Trace Context. Here’s why:

Baggage attempts to standardize representing and propagating application defined properties. On the other hand, Trace Context standardizes representing and propagating the metadata needed to enable Distributed Tracing. If you want to learn more about Distributed Tracing, you can check out my earlier post here.

Wrapping Up

Baggage gives you an open and interoperable way to do distributed context propagation without having to make changes in each participating service. It can be used for a variety of use cases that include improving Observability or enabling better control.

Thanks for reading! Hope you found this useful. Thoughts, questions, or feedback? Please let me know in the comments.

To receive future such posts from me, please follow me here on Medium and/or on Twitter. To check out some of my other posts, check out this page.

References

  1. Propagation format for distributed context: Baggage (w3.org)
  2. Baggage | OpenTelemetry
  3. Embracing context propagation. This post illustrates some practical… | by Yuri Shkuro | JaegerTracing | Medium
  4. Distributed Tracing Working Group (w3.org): This group works on the W3C Baggage and W3C TraceContext specifications.

--

--

J. Kalyana Sundaram
J. Kalyana Sundaram

Written by J. Kalyana Sundaram

Software Architect in Azure @ Microsoft.

Responses (1)