3 years ago

Automatically filtering out healthchecks on ECS and Kubernetes

Health-checks are peculiar things

Healthchecks is a monitoring technique with a special place flavor: healthchecks are fired off at regular, frequent intervals (sometimes every 10 seconds, sometimes 1 minute) by orchestration platforms and monitoring tools. Most healthchecks are HTTP-based, and the returned HTTP response is checks based on the status code and (sometimes) content. But really, the only healthchecks a person needs to know about, are those that fail, which usually lead to containers being torn down and other disruptive infrastructure changes.

Issues with health-checks in Lumigo

Given that Lumigo's pricing model is based on the amount of requests we process, the large amount of successful healthchecks that every container workload undergoes leads to undesirable consumption of quota, for data that is effectively not useful. Moreover, successful healthchecks lead to noise in the Explore and Transactions view, degrading the overall experience.

Cutting the Gordian knot

Luckily, one can often spot recognize HTTP requests that are healthchecks pretty easily! Both AWS ELB health-checks, as well as Kubernetes ones (including EKS), come in with specific User-Agent headers. Lumigo now automatically drops in the data processing pipeline in the Lumigo platform all the spans that:

Carry the User-Agent HTTP header with values that are known to be health-checks, specifically ELB-HealthChecker/* (AWS ELB, often used with Amazon ECS) and kube-probe/ (Kubernetes, including Amazon EKS)
Return an HTTP status code that denotes a successful response (a.k.a.: `2xx` like `200 OK`, `201 Accepted`, etc.). This is because if a Health-check fails (e.g. returning HTTP status code 500), usually something bad is about to happen to your containers :-)

What do you need to do on your end?

Nothing. It just works with every version of tracers we released so far for containers and all HTTP OpenTelemetry instrumentations we have ever seen. Enjoy :-)

P.S.: Matching health-checks by path (e.g., /health) sounds like a good solution on paper, but in practice it leads to very annoying false-positives (i.e., HTTP calls that are NOT related with health-checks). Moreover, healthcheck paths are configurable, and practitioners do make use for that configurability, which would lead to false negatives (health checks we let through). User-Agent headers, on the other hand, are far less often changed by healthcheck systems. User-agent matching, on the other hand, is usually rather reliable for this use-case.

3 years ago

Execution tags are now supported also in Containers!

We hear a lot from users that execution tags are one of Lumigo's most powerful features. Through execution tags, you can mark, filter and alert invocations based on data like on behalf of which of your customers did the invocation run. Developer Steve gave an excellent overview of execution tags in this Quick Bytes video.

Executions tags and OpenTelemetry

Until now, execution tags were limited to the Node.js and Python Lumigo Lambda tracers. Today, any OpenTelemetry tracer, including but not limited to the Lumigo OpenTelemetry distributions, can send execution tags to Lumigo using OpenTelemetry's Span.setAttribute(key, value) API.

This is how you can set the execution tag foo to the value bar via the Lumigo OpenTelemetry Distributions for JS and Python:

// Typescript
import { trace } from '@opentelemetry/api';

// Note: 'trace.getActiveSpan()' is available from version 1.8.0 of the Lumigo OpenTelemetry Distro for JS

trace.getActiveSpan()?.setAttribute('lumigo.execution_tags.foo','bar');

// Javascript
const { trace } = require('@opentelemetry/api');

// Note: 'trace.getActiveSpan()' is available from version 1.8.0 of the Lumigo OpenTelemetry Distro for JS

trace.getActiveSpan().setAttribute('lumigo.execution_tags.foo','bar');

// Python
from opentelemetry.trace import get_current_span

get_current_span().set_attribute('lumigo.execution_tags.foo','bar');

Setting multiple values to the execution tag is also supported (this time, both bar and baz):

// Typescript
import { trace } from '@opentelemetry/api';

trace.getActiveSpan()?.setAttribute('lumigo.execution_tags.foo',['bar','baz']);

// Javascript
const { trace } = require('@opentelemetry/api');

trace.getActiveSpan()?.setAttribute('lumigo.execution_tags.foo',['bar','baz']);
}

// Python
from opentelemetry.trace import get_current_span

get_current_span().set_attribute('lumigo.execution_tags.foo',('bar','baz',)); # Both lists and tuples work

The APIs of OpenTelemetry SDKs for other programming languages differ slightly, but the gist of the matter is: if an OpenTelemetry tracer can send span attributes (and they all can :D), they can send execution tags to Lumigo!

End-to-end filtering

Lambda and container tracers use different APIs to set execution tags, but you can filter by execution tags in Explore irrespective of their source:

Lumigo's Explore view, showing Lambda and ECS invocations filtered by the same 'tenant' tag.

Requirements

Execution tags rely on the Span.setAttribute OpenTelemetry API, which is supported by virtually all OpenTelemetry SDKs. As far as looking up the current span is concerned, the trace.getActiveSpan() is available with the Lumigo OpenTelemetry Distro for JS version 1.8.0 and above. All versions of the Lumigo OpenTelemetry Distro for Python offer the opentelemetry.trace.get_current_span API.

Lambda Node tracer: Support for tracing MongoDB 4.x

When we announced the support for the mongodb package in our OpenTelemetry Distro for JS, we heard a lot of feedback that can perhaps be best described as "Very cool, but what about Lambda Node.js?!?"

Fast forward to today: we have updated our Node.js Lambda tracer to support the mongodb version 4.x version that, among others, is compatible with MongoDB Atlas and AWS DocumentDB.

To trace mongodb v4.x, you need to:

Use the Lumigo tracer @lumigo/tracer version 1.76.0 or above, or the Lumigo layer for Node.js in these versions or above (the actual version is dependent on the region).
If you use a bundler like WebPack or ESBuild, you must keep mongodb as an external package; refer to the MongoDB and Bundlers section of the Lumigo AWS Lambda Node.js documentation.

Happy tracing :-)

Update: In the original post we pointed at the wrong version (1.75.1 instead of 1.76.0) of the @lumigo/tracer

3 years ago

Create AWS Cloudwatch metric alerts with Lumigo

You can now use Lumigo to easily create AWS Cloudwatch alarms and get notified in real-time in Slack, PagerDuty, and other tools when services like DynamoDB and SQS experience issues.

With Lumigo's new Cloudwatch metric alerts you'll no longer have to create alerts one-by-one in CloudWatch, and instead can configure multiple alerts at the same time, in just a few clicks. You can define dimensions and thresholds directly from the alert page, and setup notifications to your workflow tools without any need to integrate Cloudwatch manually.

To create a CloudWatch metric alert, go to the Alerts page and:

Select CloudWatch Metric as the Alert Type
Select the AWS Region and namespace
Select the AWS metric and dimensions
Define the alert criteria: stat, threshold, and evaluation period
Define the notification preferences: channels and frequency

Once submitting the alert, Lumigo then creates the corresponding Alarms in CloudWatch.

The documentation about creating CloudWatch metric alerts with Lumigo is available here.

P.S. Ensure your IAM role includes the following permissions, or go to Settings > AWS and update to the latest IAM role.

{
   "Effect" : "Allow",
   "Action": [
      "cloudwatch:PutMetricAlarm",
      "cloudwatch:DeleteAlarms",
   ],
   "Resources": "*"
}

3 years ago

Tracing HTTP from Lambda to containers

Lumigo just got better at tracing your applications end-to-end! Since the launch of the support for Amazon Elastic Container Service, Lumigo has known how to trace HTTP requests issued by containers instrumented with OpenTelemetry and served via Lambda functions with Lumigo tracers (we call it the "Container -- HTTP -> Lambda" flow). Today, we launch support for "the other direction": tracing HTTP requests issued by Lambda functions and served by containers using OpenTelemetry ("Lambda -- HTTP -> Container" flow). Lambda functions interacting over HTTP with containers is a pattern we see, for example, in projects that started out as entirely serverless, and then introduced containers for specialized workloads that require specific hardware capabilities like GPUs for computation-intensive tasks. It is also rather common in lift-and-shift scenarios, where existing, on-premise workloads get containerized, and new capabilities surrounding them are developed serverless.

How does it work?

This new capability is based on the [W3C TraceContext standard](https://www.w3.org/TR/trace-context/), which is implemented by all OpenTelemetry SDKs and the [Lumigo OpenTelemetry distributions](https://docs.lumigo.io/docs/containerized-applications#lumigo-opentelemetry-distributions), and now is also implemented for outgoing HTTP requests in the Lumigo Lambda tracers for Python and Node.js. (Details on the precise versions of the tracers and Lambda layers are found below.)

Opt-in support for W3C TraceContext

Note that the support for W3C TraceContext in the Lumigo Lambda tracers is opt-in, activated via the `LUMIGO_PROPAGATE_W3C=true` environment variable to be set on the Lambda function. No additional work is needed on the container / OpenTelemetry side. There, W3C TraceContext support is built-in and enabled by default by the Lumigo OpenTelemetry distributions and virtually all upstream OpenTelemetry SDKs.

Supported Lambda tracer and layer versions

Node.js:

@lumigo/tracer v1.75.0 and above
Minimum layer versions (applicable to all supported Node.js runtimes)

Python:

lumigo-tracer v1.1.206 and above
Minimum layer versions (applicable to all supported Node.js runtimes)

How to activate W3C TraceContext support for your Lambda functions

Ensure you are using a supported version of the tracer or the layer providing the tracer to your application (see previous section)
Set on your Lambda function the `LUMIGO_PROPAGATE_W3C=true` environment variable

Lambda .NET 6 Runtime Is Supported

We launched support for the `dotnet6` runtime of AWS Lambda. Similarly to Lumigo's support for previous versions of .NET in Lambda, the support is provided via the Lumigo.DotNET NuGet package. Your Lambda functions running on previous versions of .NET on Lambda should require no code changes to upgrade: just update the version of the Lumigo.DotNET package, and enjoy the latest and greatest (and, frankly, pretty awesome) .NET version on Lambda.

The documentation about instrumenting .NET Lambda functions with Lumigo is available in the AWS Lambda .NET tracing documentation.

3 years ago

Complete queries faster with Explore autocomplete

Explore autocomplete helps complete the query you intended to do, taking user experience to the next level by allowing you to query events across your application quickly and uncover deeper insights.

For more information read the documentation here.

3 years ago

Support for MongoDB package added to Lumigo OpenTelemetry Distro for JS

We introduced support for the mongodb package in the Lumigo OpenTelemetry Distro for JS, with support for versions 3.6.6 to 3.7.3 and 4.0.0 to 4.9.1 (the current latest & greatest).

The only step necessary to get MongoDB tracing to containers already traced, is to update the @lumigo/opentelemetry dependency to 1.3.0.

P.S. This update does not apply to the Lambda Node.js tracer, but stay tuned :-)

3 years ago

Improved batch containerized workload support

Since we launched Amazon ECS support earlier this summer, we have come across many user workloads that behave like batch jobs (which, incidentally, we see often scheduled via AWS Batch and, occasionally, via AWS Step Functions). Rather than relying on long-running processes that receive request over HTTP, these workloads execute jobs pulled from the Amazon SQS or sometimes the process environment, perform computation involving databases, other services and messaging queues, and then terminate.

The most intuitive representation for such transactions consist of a "root" span representing the "main" method, with the outgoing requests to databases, messaging queues and other service nested directly under the "main" span. And this is how Lumigo will now represent these workloads, provided that you use the OpenTelemetry API to create the "root" span.

Lumigo now also supports the case where the distributed trace starts with an outgoing request, but given there is no common parent span, multiple such ongoing requests will each result in a separate transaction.

Enjoy this improved support for your containerized workloads and let us know what you think about it!

P.S. If you want a hand using the OpenTelemetry API to create root spans, we are happy to help! Let us know through the support channel, and we'll gladly arrange a call to help you out. It's usually just 5 minutes coding, and then pushing it to your environment to validate :-)

P.P.S.S. Lumigo now also shows Elastic Load Balancers that serve HTTP requests issued by containerized workflows.

3 years ago

In-Platform Demo Experience

Get the full Lumigo experience—even if you aren’t using every feature—with a new in-platform demo. In one-click, fill your project with demo data and see how Lumigo can debug even the most complex of environments.

Release Notes

Automatically filtering out healthchecks on ECS and Kubernetes

Health-checks are peculiar things

Issues with health-checks in Lumigo

Cutting the Gordian knot

What do you need to do on your end?

Execution tags are now supported also in Containers!

Executions tags and OpenTelemetry

End-to-end filtering

Requirements

Further reading

Lambda Node tracer: Support for tracing MongoDB 4.x

Create AWS Cloudwatch metric alerts with Lumigo

Tracing HTTP from Lambda to containers

How does it work?

Opt-in support for W3C TraceContext

Supported Lambda tracer and layer versions

How to activate W3C TraceContext support for your Lambda functions

Further reading

Lambda .NET 6 Runtime Is Supported

Complete queries faster with Explore autocomplete

Support for MongoDB package added to Lumigo OpenTelemetry Distro for JS

Improved batch containerized workload support

In-Platform Demo Experience