Figuring out why a query that performs well in development and testing blows up in production can sometimes be a challenge. Read on to learn more about a few features that can provide insights into how your queries fare in production.
Currently Running Queries
When a client connects a PostgreSQL server, the main Postgres server process (historically called the postmaster) spawns a new process (called the backend) to service the client’s queries. Each backend, therefore, is either waiting for it’s client to send in a query, or trying to execute one.
The system view pg_stat_activity shows information about every backend that is currently running. In particular, it shows the query that the backend is currently executing if active, or the last query it executed if it is waiting for the client to send in another query.
Here are two backends serving clients connected to the database testdb
, with both of them actively executing their queries:
Sometimes the query might be waiting for a lock, and this too shows up in pg_stat_activity. You can see an INSERT waiting for a relation lock here:
For more information on pg_stat_activity, see the docs.
While this view is helpful to understand what Postgres is doing currently, it does not provide information about query execution statistics, or information about queries that have finished execution.
All Queries Run In The Past
For that, the extension pg_stat_statements is invaluable. This extension is included in the core PostgreSQL distribution and also available on managed services like AWS RDS and GCP SQL.
pg_stat_statements (PSS) is an “extension” in PostgreSQL terms, and needs to be installed first:
- Consult your Linux distro documentation to see if the extension is pre-installed or if it requires the installation of another package. For example, on Centos 7 you’ll need to
sudo yum install postgresql-contrib
. - Edit the main configuration file postgresql.conf (typically under
/etc
, like/etc/postgresql/10/main/postgresql.conf
on Debian) and change the value ofshared_preload_libraries
to “pg_stat_statements”. This is a comma-separated list of values, so if there is something there already, append a comma and then “pg_stat_statements”. - For AWS RDS, you’ll need to modify your active parameter group and set the value.
- After editing “shared_preload_libraries”, you’ll need to restart the PostgreSQL daemon. Unfortunately there is no way around this. On AWS RDS, you’ll need to restart the RDS instance.
- After a restart, the PostgreSQL server would have loaded the shared library, and we can install the extension by running
CREATE EXTENSION pg_stat_statements
. You’ll need to be a superuser to run this command. - You can actually install the extension in any database, and yet see the queries across all databases.
Once the extension is installed, you can query the view called pg_stat_statements
to get information about every single query executed since the extension was installed.
The numbers, like time taken to execute the query, are accumulated as a sum. For the query execution time alone, some statistics (average, min, max, standard deviation) are presented. These values can be cleared using the function pg_stat_statements_reset
.
Here is how a row from pg_stat_statements
looks like:
Apart from the identifying parameters (user, database, query), you can figure out many interesting things about your query:
- How long it takes to execute typically (
mean_time
) - How many rows it returns on the average (
rows
/calls
) - The amount of data read from the shared buffer cache, and the amount of data read from the disk (the
shared_blks_read
shows the total amount of data that the query read, of whichshared_blks_hit
came from the cache) - The amount of data that had to be written to the disk synchronously because of cache pressure (
shared_blks_written
) - The amount of data written, as the number of blocks touched (
shared_blks_dirtied
) - The amount of time spent in disk reads and writes (
blk_{read,write}_time
) - Temporary files written to and read from (
temp_blks_{read,written}
) - Temporary tables written to and read from (
local_*
)
The disk read and write times are available only if the configuration parameter track_io_timing
is turned on. By default, it is not. On most modern Linux systems it should be ok to turn this parameter on. Read more.
It is worthwhile to snapshot the pg_stat_statements
data continuously at regular intervals to see how these parameters are trending on a per-query basis. The open source tool pgmetrics can extract and expose the pg_stat_statements
data as JSON for easier automation.
Queries Run During a Time Range
Once you have such a system in place it becomes easy to track the queries executed in a given time frame. This makes it easy to debug issues like why a nightly batch job took longer than expected.
By subtracting the counters between two given timestamps, you can find out most of the numbers as before, except for min, max and standard deviation. This is sufficient to identify the queries that were executed within the time range and the resources they consumed.
Logging Slow Queries
Another way to quickly identify queries that take more time than expected is to turn on the logging of statements. You can specify a threshold duration, and if the query takes longer than this to finish, it is logged. (In the regular PostgreSQL log file, there is no separate one for slow queries.)
To turn on this feature, edit the configuration as below:
and reload Postgres. You can also use ALTER SYSTEM
:
With this, any statement (including non-DML ones) that takes more than a second to finish gets logged:
The actual time taken by the query, as well as the full SQL text, is logged.
If you have a log monitoring system and can track the number of slow queries per hour / per day, it can serve as a good indicator of application performance.
Query Execution Plans
Once you do have located a query which you feel should run faster, the next step is to have a look at it’s query plan. Typically, you need the actual query plan from production servers to work with. If you’re able to run EXPLAIN on production servers that great, else you need to rely on auto_explain
.
auto_explain
is another core PostgreSQL extension, either already installed or available as a “contrib” package for your distro. It is also available on AWS RDS. auto_explain
is a bit simpler to install than pg_stat_statements
:
- Edit the postgres configuration (or the RDS parameter group)
shared_preload_libraries
to includeauto_explain
. - You don’t have to restart Postgres though, you can instead just run:
LOAD 'auto_explain';
. - You’ll want to configure it’s settings, at least this one:
auto_explain.log_min_duration = 1000 # seconds
Essentially, whenever a query takes longer than auto_explain.log_min_duration
number of seconds to complete, auto_explain logs the query and it’s query execution plan in the log file, like this:
It can log the plan in JSON format too, if you have scripts that can process it:
In Postgres, there is no way other than auto_explain to look at the execution plan of a query that has already executed, which makes auto_explain an important tool in your toolbox.
About pgDash
pgDash is a modern, in-depth monitoring solution designed specifically for PostgreSQL deployments. pgDash shows you information and metrics about every aspect of your PostgreSQL database server, collected using the open-source tool pgmetrics.
No comments:
Post a Comment
It’s all about friendly conversation here at small review :) I’d love to be hear your thoughts!
Be sure to check back again because I do make every effort to reply to your comments here.