Since launching in 2014, AWS Lambda has dramatically grown in popularity. Lambda allows you to run code
That doesn't mean you don't need to keep an eye on other things, though. So what metrics should you monitor for your Lambda functions? AWS CloudWatch exposes several metrics related to Lambda performance, but let's narrow it down to what's most important. In this blog, we'll list the seven most helpful metrics to monitor to make sure your Lambda instance is running smoothly.
Blue Matador monitors all seven of these Lambda metrics out of the box—no manual setup required. Learn more about how we monitor serverless architectures > |
Lambda errors are caused by errors in your code. This number only counts the uncaught exceptions and runtime errors that your code didn’t anticipate, like an API timeout, bad type-casting, dividing by zero, or... so on and so forth. This is essentially a catchall for any reason your Lambda invocation didn’t successfully execute.
CloudWatch exposes the number of invocations that result in a function error. To address these errors, you’ll need to look at the Lambda logs to diagnose the issue.
When services like SQS or DynamoDB send events to Lambda asynchronously, any event that fails twice will be sent to a configured dead-letter queue. However, if there are errors sending the event payload to the dead-letter queue, CloudWatch registers a dead-letter error.
If you fail to monitor these dead-letter errors, you won't be aware of the associated data loss. Dead-letter errors can occur due to permissions errors, misconfigured resources, or size limits.
CloudWatch measures the time it takes to run each Lambda function. This is an important metric to monitor
An increase in function duration could be caused by either:
Again, look through the Lambda logs to diagnose this issue.
Timeouts can be very detrimental to your app’s performance. If the duration of your functions hits your pre-configured timeout, your invocation will terminate mid execution.
Unfortunately, CloudWatch doesn’t report this metric, so you’ll need to manually set up an alert with the timeout hardcoded (or monitor this automatically with Blue Matador).
If your duration is approaching a timeout, you should:
Anytime a function is successfully invoked, CloudWatch records it in its invocation metric. (This does not include throttled attempts to run the function.) Because this is a metric that determines billing, it’s important to monitor so you can identify any major changes that could lead to a jump in costs.
If you're seeing a higher-than-expected rate of invocations, code that writes to the event source for your function may be malfunctioning and writing more to DynamoDB, Kinesis, S3, SQS, etc., or your function might be failing with errors that are causing retries and boosting your invocation count.
Anomalous increases and decreases in invocations are good leading indicators and can help with root cause analysis.
AWS limits the number of invocations that can occur at any one time. Each function can reserve a chunk of the account’s concurrency limit, thereby guaranteeing the function exactly that much concurrency. Otherwise, a function will run off the unreserved concurrency limit (if any is available).
When the concurrency limit is hit, Lambda will not invoke a function and will throttle it instead. You’ll want to monitor your functions for throttles so you know when you need to adjust your concurrency.
If you are running into Lambda throttling, we dedicated an entire post to how to troubleshoot this and strategies to avoid it going forward.
Lambda functions that have DynamoDB or Kinesis Streams as their event source read from those streams with an iterator. If DynamoDB is changed or records are put to Kinesis too rapidly, your function may not be able to read from the iterator quickly enough. This is reflected in the iterator age, which is the amount of time between when a record is written to the stream and when Lambda reads it.
Your iterator age could get behind due to a high volume of data in your stream or poor performance of your Lambda function.
If your iterator age gets too far behind, you could lose data. When this occurs, check whether the code triggering changes in your stream is erroneously doing so at an increased rate. If so, fix the code that’s causing the issues and just wait for Lambda to catch up. Otherwise, you increase the concurrency on your function so you can invoke your function more often to lower the iterator age.
Monitoring these seven key metrics for Lambda will ensure that your Lambda functions perform well. While it is possible to set up alerts for these metrics with a traditional monitoring tool, the process will be laborious and the result could be suboptimal. For instance, manually setting up an alert on the duration metric with the timeout hardcoded for each of your Lambda functions would be tedious and error-prone, or setting up an alert with a static threshold for spikes in your invocations could result in noise and false positives.
That’s why we recommend giving Blue Matador a try. Out-of-the-box, Blue Matador’s pre-configured alerts and statistical algorithms will automatically monitor these key metrics for your Lambda function in addition to hundreds of other events in your AWS environment. Started your 14-day trial and see how easy it is to monitor your Lambda functions with Blue Matador.