Creating custom app-level or pod-level Grafana alerts/notifications

Overview

You would like to create a custom Grafana alert/notification for a specific app or on a specific app pod to monitor memory usage, for example, you would like an alert when your 'web' pod reaches memory usage greater than 80%.

Adding an alert from the edit page of a given vanilla dashboard is not possible and instead the following notification is presented:

Template variables are not supported in alert queries

Solution

The "Template variables are not supported in alert queries" message comes from the fact that in the queries, the variable $hephy_app is in use. This variable usage is required, as at the point of Grafana creation (when the cluster is created) the application name(s) is/are unknown. The use of the variable also allows the same graphs to be used repeatedly on the same dashboard when the selected application is changed.

However, a limitation of the Grafana graphs is that these variables cannot then be then used when creating custom alerts. 

To get around this limitation, create a new Hephy / Custom Alerting Dashboard and upon this clone the requested Memory and Unavailable Replicas graphs, then rework the queries to specify the exact applications. This will then make it possible to successfully add alerts for the custom graphs.

Detailed steps are illustrated below by creating a custom app-level alert to monitor Memory utilization and another custom alert to monitor Unavailable Replicas.

Note: Refer to Monitoring Applications on Kontainers for instructions on how to launch Grafana from the EYK Web Console.

For specific app Memory:

  1. On the Hephy / Engine Yard Kontainers / Hephy Apps dashboard click on the down arrow that shows when hovering over the Memory graph title, then More -> Copy Memory_graph.png
  2. On Dashboards -> Manage, click New Dashboard New_Dashboard.png
  3. On that new dashboard click Paste copied panel in the graph placeholder created
  4. Click on the down arrow that shows when hovering over the Memory graph title, then Edit
  5. In the right-hand frame change the Panel title
  6. In the main frame update the three Metrics to use the specific application name rather than $hephy_app
  7. Click Apply in the top-right corner
  8. If required for further apps: Click on the down arrow that shows when hovering over the modified app-specific Memory graph title, then More -> Duplicate
  9. Repeat steps 4 to 6 to update this new graph to the required apps
  10. Once all graphs have been added click the disk icon in the top bar to save the new dashboard, giving it a suitable name.
  11. Alerts can then be created in the usual way via the Alert tab when editing the graphs.

For specific app Unavailable Replicas:

  1. Repeat the process above (obviously skipping step 2 if the Dashboard already exists and instead just navigating to it).
  2. Copy the Unavailable Replicas graph this time.
  3. Again change the $hephy_app references, the -.+ part of deployment is a wild card and can be left as is in order to catch all processes.
  4. The $1 is a part of the label_replace function and can also be left as-is.
  5. The rest of the steps to create custom alerts are the same as above.

Caveats:

  1. As the app names are now hard-coded, new graphs will have to be created for each app existing and future ones deployed.
  2. As the dashboard is custom it is not part of the cluster creation process, therefore it won't be a part of the cluster upgrade process.  Keep an eye for official communication from us regarding upgrades and accordingly update your custom dashboards following an upgrade.

Testing

In place of the "Template variables are not supported in alert queries" message, the Create Alert button should now be active on the Alert tab when editing the graphs as shown below:

Create_Alert.png

Alerts can now be successfully created by adding appropriate rules.

 

 

Comments

Please sign in to leave a comment.

Powered by Zendesk