PostgreSQL : A Quick Review of Scan Types and Joins

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

A Quick Review of Scan Types and Joins

It’s important to know that every join type and scan type have their time and place. Some people look for the word “Sequential” scan and immediately jump back in fear, not considering whether it would be worthwhile to access data another. Take, for example, a table with 2 rows -- it would not make sense to the query planner to scan the index, then go back and retrieve data from the disk when it could just quickly scan the table and pull data out without touching the index. In this case, and in the case of most other small-ish tables, it would be more efficient to do a sequential scan. To quickly review the join and scan types that PostgreSQL works with:

Scan Types
- Sequential Scan
  - Basically a brute-force retrieval from disk
  - Scans the whole table
  - Fast for small tables
- Index Scan
  - Scan all/some rows in an index; look up rows in heap
  - Causes random seek, which can be costly for old-school spindle-based disks
  - Faster than a Sequential Scan when extracting a small number of rows for large tables
- Index Only Scan
  - Scan all/some rows in index
  - No need to lookup rows in the table because the values we want are already stored in the index itself
- Bitmap Heap Scan
  - Scan index, building a bitmap of pages to visit
  - Then, look up only relevant pages in the table for desired rows
Join Types
- Nested Loops
  - For each row in the outer table, scan for matching rows in the inner table
  - Fast to start, best for small tables
- Merge Join
  - Zipper-operation on _sorted_ data sets
  - Good for large tables
  - High startup cost if an additional sort is required
- Hash Join
  - Build hash of inner table values, scan outer table for matches
  - Only usable for equality conditions
  - High startup cost, but fast execution

As we can see, every scan type and join type has its place. What’s most important is that the query planner has good statistics to work with, as mentioned earlier.

We’ve only talked about one instance where EXPLAIN helped identify a problem and give an idea of how to solve it. At EDB Support, we’ve seen many situations where EXPLAIN could help identify things like:

Inaccurate statistics leading to poor join/scan choices
Maintenance activity (VACUUM and ANALYZE) not aggressive enough
Corrupted indexes requiring a REINDEX
Index definition v. query mismatch
work_mem being set too low, preventing in-memory sorts and joins
Poor performance due to join order listing when writing a query
Improper ORM configuration

EXPLAIN is certainly one of the most invaluable tools for anyone working with PostgreSQL, and using it well will save you lots of time!

Wednesday, November 4, 2020

PostgreSQL : A Quick Review of Scan Types and Joins

A Quick Review of Scan Types and Joins

No comments:

Post a Comment

Featured Post

SQL Server : SELECT all columns to be good or bad in database system

Popular Posts