Choosing the Best Indexes for MySQL Query Optimization
Read our detailed tutorial
below and learn more about indexing best practices.
This tutorial won't detail all
the internals of the algorithm, but rather try to lay out the basic and
important aspects of indexing, in simple terms.
Also, and most importantly, we'll present practical examples for properly
indexing your tables and queries by relying on a set of rules.
Our focus in this tutorial is on MySQL / Maria DB, though this information may
be relevant for other relational databases as well with similar indexing
structures.
Which indexes should I create to optimize
my SQL query?
Before we begin, please note
that it's recommended to strive to create indexes which "match" as
many of the queries in your workload as possible, as it will reduce the overall
amount of indexes, and the load their existence can create on the database when
inserting or updating data (which requires updating the indexes as well). Also,
keep in mind that the order of the columns in the index matters.
As a general rule of thumb,
MySQL can only use one index for each table in the query (excluding index
merges which we won't cover in this post), so you should start with listing all
the tables in your query and see if and which index you should create for it.
When creating an index, you
should usually start with the equality conditions in the WHERE and JOIN
conditions. In most cases, conditions such as name = 'John' will allow the database to filter
many of the rows from the table and go through a small amount of rows to return
the required results. Therefore, we should start indexing by adding these
columns to the index. Keep in mind you should probably only add columns which
are a part of selective conditions, as otherwise they do not contribute a lot
to optimizing the query.
Then, you should look into the
range conditions, but you should only add one of them for each table - the most
selective condition, as MySQL can only handle one 'ranged column' in each
index. In some cases when there are no range conditions, it makes sense to add
the GROUP BY / ORDER BY columns, assuming the ordering is done in only one
direction (ASC / DESC).
In some cases, it also makes
sense to create a separate index that contains the ORDER BY clause's columns, as
MySQL sometimes chooses to use it. Please note though that for this to happen,
the index should contain all columns from the ORDER BY clause and they should
all be specified with the same order (ASC / DESC). This doesn't guarantee that
the database's optimizer will pick this index rather than the other compound
indexes. It will only happen when MySQL knows that going through the sorting
index will be more efficient to quickly track the result, when compared to
working with other indexes (or scanning the full table).
Also, in some cases, it makes
sense to also add the columns from the SELECT clause to the index, to create
covering index. This is only relevant if the index isn't already 'too large'.
What's too large? Well, no official rule of thumb here, but I usually go with
up to 5 columns in an index. Creating a covering index allows the database to
not only filter using the index, but to also fetch the information required by
the SELECT clause directly from the index, which saves precious I/O operations for
fetching the data of the filtered rows from the table's data.
Let's look at an example to
clarify:
1 2 3 4 5 6 7 8 9 |
|
For this query, we'll start
with adding the columns first_name and last_name, which are compared with an equality
operator. Then, we'll add the age column which is compared with a range
condition. No need to have the ORDER BY clause indexed here, as the age column
is already in the index. Last but not least, we'll add id from
the SELECT clause to the index to have a covering index.
So to index this query
properly, you should create the index:
employees (first_name,
last_name, age, id).
The above is a very simplified
pseudo-algorithm that will allow you to build simple indexes for rather simple
SQL queries.
When determining which columns
from the JOIN ON clauses to index, you should keep in mind that MySQL has an
algorithm to detect which table you should start with in the INNER JOIN, which
also determines which column to index first. More information and a
pseudo-algorithm can be found on MySQL's documentation.
What not to do when indexing (or writing
SQL queries)?
We gathered some of the most
common mistakes we see programmers and database administrators do when writing
queries and indexing their tables.
Indexing each column in the
table separately
In most cases, MySQL won't be
able to use more than one index for each table in the query (excluding very
specific cases of index merges).
Therefore, when creating a
separate index for each column in the table, the database is bound to perform only
one of the search operations using an index, and the rest of them will be
significantly slower, as the database can't use an index to execute them.
We recommend using compound
(multi-column) indexes wherever appropriate, rather than single-column indexes
for each column individually.
The OR operator in filtering
conditions
Consider this query:
1 2 3 4 5 6 |
|
In many cases, MySQL won't be
able to use an index to apply an OR condition, and as a result, this query is
not index-able.
Therefore, we recommend to
avoid such OR conditions and consider splitting the query to two parts,
combined with a UNION DISTINCT (or even better, UNION ALL, in case you know
there won't be any duplicate results)
The order of columns in an
index is important
Let's say I hand you my
contacts phone book which is ordered by the contact's first name and ask you to
count how many people are there named "John" in the book. You'll grab
the book in both hands and say "no problem". You will navigate to the
page that holds all names starting with John, and start counting from there.
Now, let's say I change the
assignment and hand you a phone book that is ordered by the contact's last
name, but ask you to still count all contacts with the first name
"John". How would you approach that? Well, the database scratches his
head in this situation as well.
Now let’s look at an SQL query
to demonstrate the same behaviour with the MySQL optimizer:
1 2 3 4 5 6 |
|
Having the index contacts (first_name, last_name) is ideal here, because the index
starts with our filtering condition and ends with another column in the SELECT
clause.
But, having the reverse index contacts (last_name, first_name) is rather useless, as the database
can't use the index for filtering, as the column we need is second in the index
and not first.
The conclusion from this
example is that the order of columns in an index is rather important.
Redundant indexes
Indexes are magnificent when
trying to optimize your SQL queries and they can improve performance
significantly.
But, they come with a downside
as well. Each index you're creating should be kept updated and in sync when
changes occur in your databases. So for each INSERT / UPDATE / DELETE in your
databases, all relevant indexes should be updated. This update can take some
time, especially with large tables / indexes.
Therefore, do not create
indexes unless you know you'll need them.
No comments:
Post a Comment
It’s all about friendly conversation here at small review :) I’d love to be hear your thoughts!
Be sure to check back again because I do make every effort to reply to your comments here.