Last updated at Fri, 03 Nov 2017 13:51:05 GMT
“Silence is golden”
This is not always true, especially when something you were expecting to happen* doesn’t* happen.
However, this is true when you have a system or a service you are trying to maintain and things stop happening or go quiet.
We recently developed a new service — Inactivity Alerting — to help you with this common challenge and, as you might expect, it fires alerts when there is noted inactivity around a specific log or event.
You can set up an inactivity alert to fire if a log stops sending events entirely, you can look to see if specific events are no longer being sent, you can also use comparisons on KVPs such as (>, <, =, >=, !=) and regular expressions to look for logs which contain an expected pattern stop occurring.
I have included the top four alerts to must have so you know when log events stop occurring or significant system behavior changes.
1. Alert when a log stops logging:
You may want to be alerted if a system or application stops sending events altogether. A common example is when you have a piece of hardware like a firewall, which is almost always sending events. But, it sends a broad mix of events so you would not be surprised not to see specific events for a long period. But if it stops sending events altogether this could mean the firewall is down.
To create, simply set up an inactivity alert (but do not specify an exact pattern to be recognized) and specify the allowed time of inactivity, which you can tweak later. The result is an alert which fires if no events are delivered from that log over the period of inactivity you specified.
2. Alert on a heartbeat
With systems and applications becoming more and more complex and having more dependencies, it is good to know that your application still has access to all the resources it needs.
An easy way to monitor is to have your application log a heartbeat at regular intervals. Taking this a step further, you can have your application log a health check heartbeat for specific resources it needs. An example might be a DB connection.
Checking that an application has the resources to interact with a DB (and logging this at regular intervals) makes it easy to ensure that your application responds as expected.
Example Log Heartbeat event:
12:14:26 - Server01 – Heartbeat_DB_connection=OK
Create an Inactivity Alert as shown below:
3. Alert when specific events stop happening (Sales events)
Modern applications are usually quite complicated in their dependencies and intricacies. Unfortunately, this means an issue in one section could result in an issue in another area, which is mission critical.
A simple example might an issue with a configuration, which might result in stock levels of certain items in a catalog being marked as not available.
This might not be picked up as the site/application might still function just users won’t have any items that they can purchase.
To help a business identify when critical events stop occurring, such as specific sales events, a business should use Inactivity Alerting to notify them when this happens.
For example, another company might expect to make an online sale every 1 minute, but occasionally there are 5-minute lapses. By setting up an inactivity alert (to alert when there have been no sales for 10 minutes), a business can take their monitoring to the next level and alert based on expected user behavior. They now can pick up any issues that result in users not being able to purchase in as short a period.
4. Alert when events have values drop below a specific level
Using KVP’s in a log give the user immense precision power with their inactivity alerting.
Maybe you need to know if your logs have a response time that rises above a certain value, or stops being logged altogether.
Example Log:
2014-10-19 09:18:40 at=info method=GET path=/images/sample.gif host=aaa.herokuapp.com fwd="128.249.38.195/NX" dyno=web.1 connect=1ms **<span style="text-decoration: underline;">service=125ms</span>** status=status bytes=6168 userID=user9144121
In the example log above there is a service time for the specific event. There may be a large percentage of events you would expect to respond quickly (sub 500ms), but there may be a few where a slow response time is expected. This means that you cannot alert if the response time is grater than 500ms since you expect slow response times occasionally.
By setting up an Inactivity Alert to monitor that faster events are still occurring, you can be notified if your application slows down but alert allows for the slower events that you expect to happen.
Inactivity Alerting on log, strings, patterns or comparisons is incredibly simple and powerful, and should be used anywhere you need to know if your application is still up and behaving as expected.