EnjoY | Database Research And Development: PostgreSQL : Notes on PostgreSQL B-Tree Indexes

Friday, September 18, 2020

PostgreSQL : Notes on PostgreSQL B-Tree Indexes

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

Notes on PostgreSQL B-Tree Indexes

Working with B-Tree indexes in Postgres


PostgreSQL comes with no less than 6 different types of indexes, with the B-Tree index being the most commonly used. Read on to find out more about B-Tree indexes in PostgreSQL.

Types of Indexes

An index in PostgreSQL, like those created for PRIMARY KEYs and UNIQUEs in a CREATE TABLE statement or created explicitly with a CREATE INDEX statement, are of a particular “type” (although technically we should be calling them “index access methods”).

PostgreSQL comes with these built-in index types:

  • B-Tree
  • Hash
  • GIN – Generalized Inverted Index
  • BRIN – Block Range Index (only in v9.5 and above)
  • GiST – Generalized Inverted Search Tree
  • SP-GiST – Space Partitioned GiST

B-Tree is the default and the most commonly used index type. Specifying a primary key or a unique within a CREATE TABLE statement causes PostgreSQL to create B-Tree indexes. CREATE INDEX statements without the USING clause will also create B-Tree indexes:

-- the default index type is btree
CREATE INDEX ix_year ON movies (year);

-- equivalent, explicitly lists the index type
CREATE INDEX ix_year ON movies USING btree (year);

Ordering

B-Tree indexes are inherently ordered. PostgreSQL can make use of this order rather than sorting on the indexed expression. For example, getting the titles of all 80s movies sorted by title would require a sort:

idxdemo=# explain select title from movies where year between 1980 and 1989 order by title asc;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Sort  (cost=240.79..245.93 rows=2056 width=17)
   Sort Key: title
   ->  Index Scan using ix_year on movies  (cost=0.29..127.65 rows=2056 width=17)
         Index Cond: ((year >= 1980) AND (year <= 1989))
(4 rows)

But if you’re sorting them by the indexed column (year), additional sort is not required.

idxdemo=# explain select title from movies where year between 1980 and 1989 order by year asc;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Index Scan using ix_year on movies  (cost=0.29..127.65 rows=2056 width=21)
   Index Cond: ((year >= 1980) AND (year <= 1989))
(2 rows)

Fill Factor

For tables that aren’t going to get updated, you can increase the “fill factor” from the default of 90, which should give you slightly smaller and faster indexes. Conversely, if there are frequent updates of the table involving the indexed parameter, you can reduce the fill factor to a smaller number – this will allow for faster inserts and updates, at the cost of slightly larger indexes.

CREATE INDEX ix_smd ON silent_movies (director) WITH (fillfactor = 100);

Indexing on Text

B-Tree indexes can help in prefix matching of text. Let’s take a query to list all the movies starting with the letter ‘T’:

idxdemo=> explain select title from movies where title like 'T%';
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on movies  (cost=0.00..1106.94 rows=8405 width=17)
   Filter: (title ~~ 'T%'::text)
(2 rows)

This plan calls for a full sequential scan of the table. What happens if we add a B-Tree index on movies.title?

idxdemo=> create index ix_title on movies (title);
CREATE INDEX

idxdemo=> explain select title from movies where title like 'T%';
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on movies  (cost=0.00..1106.94 rows=8405 width=17)
   Filter: (title ~~ 'T%'::text)
(2 rows)

Well, that didn’t help at all. However, there is a form of magic pixie dust that we can sprinkle to get Postgres to do what we want:

idxdemo=> create index ix_title2 on movies (title text_pattern_ops);
CREATE INDEX

idxdemo=> explain select title from movies where title like 'T%';
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Bitmap Heap Scan on movies  (cost=236.08..1085.19 rows=8405 width=17)
   Filter: (title ~~ 'T%'::text)
   ->  Bitmap Index Scan on ix_title2  (cost=0.00..233.98 rows=8169 width=0)
         Index Cond: ((title ~>=~ 'T'::text) AND (title ~<~ 'U'::text))
(4 rows)

The plan now uses an index, and the cost has reduced. The magic here is “text_pattern_ops” which allows the B-Tree index over a “text” expression to be used for pattern operators (LIKE and regular expressions). The “text_pattern_ops” is called an Operator Class.

Note that this will work only for patterns with a fixed text prefix, so “%Angry%” or “%Men” will not work. Use PostgreSQL’s full text search for advanced text queries.

Covering Indexes

Covering indexes were added to PostgreSQL in v11. Covering indexes let you include the value of one or more expressions along with the indexed expression inside the index.

Let’s try querying for all movie titles, ordered by year of release:

idxdemo=# explain select title from movies order by year asc;
                             QUERY PLAN
--------------------------------------------------------------------
 Sort  (cost=3167.73..3239.72 rows=28795 width=21)
   Sort Key: year
   ->  Seq Scan on movies  (cost=0.00..1034.95 rows=28795 width=21)
(3 rows)

This involves a full sequential scan of the table, followed by a sort of the projected columns. Let’s first add a regular index on movies.year:

idxdemo=# create index ix_year on movies (year);
CREATE INDEX

idxdemo=# explain select title from movies order by year asc;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Index Scan using ix_year on movies  (cost=0.29..1510.22 rows=28795 width=21)
(1 row)

Now Postgres decides to use the index to pull out the entries out of the table directly in the desired order. The table needs to be looked up because the index contains only the value of ‘year’ and the reference to the tuple in the table.

If we include the value of ‘title’ also inside the index, the table lookup can be avoided totally. Let’s use the new syntax to create such an index:

idxdemo=# create index ix_year_cov on movies (year) include (title);
CREATE INDEX
Time: 92.618 ms

idxdemo=# drop index ix_year;
DROP INDEX

idxdemo=# explain select title from movies order by year asc;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Index Only Scan using ix_year_cov on movies  (cost=0.29..2751.59 rows=28795 width=21)
(1 row)

Postgres is now using an Index Only Scan, which means the table lookup is totally avoided. Note that we had to drop the old index, because Postgres didn’t choose ix_year_cov over ix_year for this query.

Clustering

PostgreSQL infamously does not support automatic physical ordering of rows in a table, unlike “clustered indexes” in other RDBMS. If most of your queries are going to pull out most of the rows of a mostly static table in a fixed order, it would be a good idea to layout the physical table storage in that order and use sequential scans. To reorder a table physically in the order dictated by an index, use:

CLUSTER VERBOSE movies USING ix_year;

You’d typically use a B-Tree index to recluster a table, as it provides a complete order for all the rows in the table.

Index Stats

How much disk space does your index take up? The pg_relation_size function can answer that:

idxdemo=# select * from pg_relation_size('ix_year');
 pg_relation_size
------------------
           663552
(1 row)

This returns the disk space used by the index, in bytes.

More information about the index can be gathered using the standard extension pgstattuple. Before you use the functions below, you need to do a CREATE EXTENSION pgstattuple; in the relevant database as a superuser. Usage of these functions also needs superuser privileges.

The pgstattuple function returns, among other things, the unused (free_space) and reusable (dead_tuple_len) disk space within the index. This can be very helpful in deciding whether to run a REINDEX to reduce index bloat.

idxdemo=# select * from pgstattuple('ix_year'::regclass);
-[ RECORD 1 ]------+-------
table_len          | 663552
tuple_count        | 28795
tuple_len          | 460720
tuple_percent      | 69.43
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 66232
free_percent       | 9.98

The pgstattuple function returns B-Tree specific information, including the level of the tree:

idxdemo=# select * from pgstatindex('ix_year'::regclass);
-[ RECORD 1 ]------+-------
version            | 2
tree_level         | 1
index_size         | 663552
root_block_no      | 3
internal_pages     | 1
leaf_pages         | 79
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 89.72
leaf_fragmentation | 0

This can be used to decide whether to adjust the fill factor of the index.

Examining B-Tree Index Contents

Even the contents of the B-Tree can be examined directly, using the extension pageinspect. The usage of this extension needs superuser privileges.

Here are the properties of a single page (here, the 13th page) of the index:

idxdemo=# select * from bt_page_stats('ix_year', 13);
-[ RECORD 1 ]-+-----
blkno         | 13
type          | l
live_items    | 367
dead_items    | 0
avg_item_size | 16
page_size     | 8192
free_size     | 808
btpo_prev     | 12
btpo_next     | 14
btpo          | 0
btpo_flags    | 1

And here are the actual contents of each item (limited to 5 here) in the page:

idxdemo=# select * from bt_page_items('ix_year', 13) limit 5;
 itemoffset |   ctid   | itemlen | nulls | vars |          data
------------+----------+---------+-------+------+-------------------------
          1 | (104,40) |      16 | f     | f    | 86 07 00 00 00 00 00 00
          2 | (95,38)  |      16 | f     | f    | 86 07 00 00 00 00 00 00
          3 | (95,39)  |      16 | f     | f    | 86 07 00 00 00 00 00 00
          4 | (95,40)  |      16 | f     | f    | 86 07 00 00 00 00 00 00
          5 | (96,1)   |      16 | f     | f    | 86 07 00 00 00 00 00 00
(5 rows)

And if you’re thinking of writing a query to aggregate something over each page, you’ll also need the total number of pages in the relation, which can be got via pg_relpages from the pgstattuple extension:

idxdemo=# select pg_relpages('ix_year');
 pg_relpages
-------------
          81
(1 row)

Other Index Types

B-Tree indexes are versatile tools for optimizing queries. With a bit of experimentation and planning, it can be used to vastly improve the response times of applications and report jobs.

The other index types of PostgreSQL are also useful and can be more efficient and performant than B-Tree in specific cases. This article gives a quick overview of all the types.

Got a tip on indexes that you’d like to share? Do leave them as a comment below!

No comments:

Post a Comment

It’s all about friendly conversation here at small review :) I’d love to be hear your thoughts!

Be sure to check back again because I do make every effort to reply to your comments here.

Featured Post

SQL Server : SELECT all columns to be good or bad in database system

This article is half-done without your Comment! *** Please share your thoughts via Comment *** In this post, I am going to write about one o...

Popular Posts