Notes on
PostgreSQL B-Tree Indexes
Working with B-Tree
indexes in Postgres
PostgreSQL comes with no less than 6 different types of indexes, with the B-Tree index being the most commonly used. Read on to find out more about B-Tree indexes in PostgreSQL.
Types of Indexes
An index in PostgreSQL, like those created for PRIMARY KEYs and UNIQUEs in a CREATE TABLE statement or created explicitly with a CREATE INDEX statement, are of a particular “type” (although technically we should be calling them “index access methods”).
PostgreSQL comes with these built-in index types:
- B-Tree
- Hash
- GIN – Generalized Inverted Index
- BRIN – Block Range Index (only in v9.5 and above)
- GiST – Generalized Inverted Search Tree
- SP-GiST – Space Partitioned GiST
B-Tree is the default and the most commonly used index type. Specifying a primary key or a unique within a CREATE TABLE statement causes PostgreSQL to create B-Tree indexes. CREATE INDEX statements without the USING clause will also create B-Tree indexes:
Ordering
B-Tree indexes are inherently ordered. PostgreSQL can make use of this order rather than sorting on the indexed expression. For example, getting the titles of all 80s movies sorted by title would require a sort:
But if you’re sorting them by the indexed column (year), additional sort is not required.
Fill Factor
For tables that aren’t going to get updated, you can increase the “fill factor” from the default of 90, which should give you slightly smaller and faster indexes. Conversely, if there are frequent updates of the table involving the indexed parameter, you can reduce the fill factor to a smaller number – this will allow for faster inserts and updates, at the cost of slightly larger indexes.
Indexing on Text
B-Tree indexes can help in prefix matching of text. Let’s take a query to list all the movies starting with the letter ‘T’:
This plan calls for a full sequential scan of the table. What happens if we add a B-Tree index on movies.title?
Well, that didn’t help at all. However, there is a form of magic pixie dust that we can sprinkle to get Postgres to do what we want:
The plan now uses an index, and the cost has reduced. The magic here is “text_pattern_ops” which allows the B-Tree index over a “text” expression to be used for pattern operators (LIKE and regular expressions). The “text_pattern_ops” is called an Operator Class.
Note that this will work only for patterns with a fixed text prefix, so “%Angry%” or “%Men” will not work. Use PostgreSQL’s full text search for advanced text queries.
Covering Indexes
Covering indexes were added to PostgreSQL in v11. Covering indexes let you include the value of one or more expressions along with the indexed expression inside the index.
Let’s try querying for all movie titles, ordered by year of release:
This involves a full sequential scan of the table, followed by a sort of the projected columns. Let’s first add a regular index on movies.year:
Now Postgres decides to use the index to pull out the entries out of the table directly in the desired order. The table needs to be looked up because the index contains only the value of ‘year’ and the reference to the tuple in the table.
If we include the value of ‘title’ also inside the index, the table lookup can be avoided totally. Let’s use the new syntax to create such an index:
Postgres is now using an Index Only Scan, which means the table lookup is totally avoided. Note that we had to drop the old index, because Postgres didn’t choose ix_year_cov over ix_year for this query.
Clustering
PostgreSQL infamously does not support automatic physical ordering of rows in a table, unlike “clustered indexes” in other RDBMS. If most of your queries are going to pull out most of the rows of a mostly static table in a fixed order, it would be a good idea to layout the physical table storage in that order and use sequential scans. To reorder a table physically in the order dictated by an index, use:
You’d typically use a B-Tree index to recluster a table, as it provides a complete order for all the rows in the table.
Index Stats
How much disk space does your index take up? The pg_relation_size function can answer that:
This returns the disk space used by the index, in bytes.
More information about the index can be gathered using the standard extension pgstattuple. Before you use the functions below, you need to do a CREATE EXTENSION pgstattuple;
in the relevant database as a superuser. Usage of these functions also needs superuser privileges.
The pgstattuple
function returns, among other things, the unused (free_space
) and reusable (dead_tuple_len
) disk space within the index. This can be very helpful in deciding whether to run a REINDEX
to reduce index bloat.
The pgstattuple
function returns B-Tree specific information, including the level of the tree:
This can be used to decide whether to adjust the fill factor of the index.
Examining B-Tree Index Contents
Even the contents of the B-Tree can be examined directly, using the extension pageinspect. The usage of this extension needs superuser privileges.
Here are the properties of a single page (here, the 13th page) of the index:
And here are the actual contents of each item (limited to 5 here) in the page:
And if you’re thinking of writing a query to aggregate something over each page, you’ll also need the total number of pages in the relation, which can be got via pg_relpages
from the pgstattuple
extension:
Other Index Types
B-Tree indexes are versatile tools for optimizing queries. With a bit of experimentation and planning, it can be used to vastly improve the response times of applications and report jobs.
The other index types of PostgreSQL are also useful and can be more efficient and performant than B-Tree in specific cases. This article gives a quick overview of all the types.
Got a tip on indexes that you’d like to share? Do leave them as a comment below!
No comments:
Post a Comment
It’s all about friendly conversation here at small review :) I’d love to be hear your thoughts!
Be sure to check back again because I do make every effort to reply to your comments here.