Knowledge Base/Engine Yard Cloud Documentation/Manage your Database

About PostgreSQL Alerts

Diana Lam
posted this on June 20, 2013 03:37 PM

Updated: June 20th, 2013

This document describes the PostgreSQL alerts.

About PostgreSQL Alerts

Engine Yard monitors the health of your PostgreSQL database using a combination of our own custom checks and Bucardo’s check_postgres scripts. Collectd or Nagios (depending on your stack and features) consumes the results of these checks and present them to the Engine Yard dashboard as alerts. The alerts we show you follow this format:

[SEVERITY] [environment-name] [originating-process]: [check-name] [severity] [additional information]

  • severity -- The exit code of the check. It can have any of these values: OK, WARNING, FAILURE, CRITICAL, UNKNOWN, etc.
  • environment-name -- The name of the environment that originated the alert.
  • originating-process -- The process that generated the alert.
  • check-name -- The name of the check that ran the process.
  • additional information -- Extra information reported by the failing check.

Sample alert:

Alert(CRITICAL) MyappProduction process-postgresql: POSTGRES_CHECKPOINT CRITICAL: Last checkpoint was 16204 seconds ago

This sample alert means that in the MyappProduction application, the postgres_checkpoint check raised an alert on the Postgresql process. The checkpoint check issued a severity of critical. The associated message is that the database has not had a checkpoint for about 4.5 hours.

We specify the severity of the monitoring checks based on the thresholds defined when your database was created. This section discusses the most important checks for PostgreSQL and their meanings.

The Connections check

The connections check verifies that the database process is functioning and connections can be established to it.

connections_check.png

When do we warn you? We will test the connection to your database every 60 seconds, we will warn you when a connection to the database fails.

What to do if you see this check? Contact Engine Yard Support immediately because your site may be down.

The Checkpoint check

This check determines how long since the last checkpoint has been run. A checkpoint is a point in the transaction log sequence at which all data files have been updated to reflect the information in the log and flushed to disk. If your system crashes, recovery will start from the last known checkpoint. The checkpoint check helps us confirm two things:

  1. Your database consistently takes forward the position in which recovery is started.
  2. In the case of replicas, your standby is keeping up with its master (because the activity the replica sees is what the master has sent it).

checkpoint_check.png

When do we warn you? We issue a WARNING severity when checkpoint delays range from 20 to 30 minutes. For delays that exceed 30 minutes, the severity of the alert is CRITICAL.

What to do if you see this check? Contact Engine Yard Support if you see a severity of CRITICAL or FAILURE.

The Snaplock check

The snaplock check alerts us of inconsistent snapshots. Before taking a database snapshot, we attempt to lock it to prevent writes and ensure a consistent snapshot.

snaplock_check.png

When do we warn you? We will warn you when we have failed to obtain a lock before a database snapshot.

What to do if you see this check? Contact Engine Yard Support if the source of your snapshots shows this alert. For example if you have moved your snapshots to the replica and we cannot lock it before a snapshot, it may mean you have no snapshots that are consistent and usable for recovery.

...

Tip: You can subscribe to this article to keep up to date on changes. You might also want to subscribe to the Release Notes forum.


If you have feedback or questions about this page, add a comment below. If you need help, submit a ticket with Engine Yard Support.

 

Comments

User photo
Tim Heighes
Skillable

Please add a section about disk space usage alerts. We recently got one on a Postgresql instance that cleared itself a few days later. Would this have been the  autovacuum daemon kicking in? Is there any documentation on how EY configures Postgresql for different environments?

February 14, 2014 01:04 AM