MySQL : Choosing the Best Indexes for MySQL Query Optimization

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

Choosing the Best Indexes for MySQL Query Optimization

Read our detailed tutorial below and learn more about indexing best practices.

This tutorial won't detail all the internals of the algorithm, but rather try to lay out the basic and important aspects of indexing, in simple terms.
Also, and most importantly, we'll present practical examples for properly indexing your tables and queries by relying on a set of rules.
Our focus in this tutorial is on MySQL / Maria DB, though this information may be relevant for other relational databases as well with similar indexing structures.

Which indexes should I create to optimize my SQL query?

Before we begin, please note that it's recommended to strive to create indexes which "match" as many of the queries in your workload as possible, as it will reduce the overall amount of indexes, and the load their existence can create on the database when inserting or updating data (which requires updating the indexes as well). Also, keep in mind that the order of the columns in the index matters.

As a general rule of thumb, MySQL can only use one index for each table in the query (excluding index merges which we won't cover in this post), so you should start with listing all the tables in your query and see if and which index you should create for it.

When creating an index, you should usually start with the equality conditions in the WHERE and JOIN conditions. In most cases, conditions such as name = 'John' will allow the database to filter many of the rows from the table and go through a small amount of rows to return the required results. Therefore, we should start indexing by adding these columns to the index. Keep in mind you should probably only add columns which are a part of selective conditions, as otherwise they do not contribute a lot to optimizing the query.

Then, you should look into the range conditions, but you should only add one of them for each table - the most selective condition, as MySQL can only handle one 'ranged column' in each index. In some cases when there are no range conditions, it makes sense to add the GROUP BY / ORDER BY columns, assuming the ordering is done in only one direction (ASC / DESC).

In some cases, it also makes sense to create a separate index that contains the ORDER BY clause's columns, as MySQL sometimes chooses to use it. Please note though that for this to happen, the index should contain all columns from the ORDER BY clause and they should all be specified with the same order (ASC / DESC). This doesn't guarantee that the database's optimizer will pick this index rather than the other compound indexes. It will only happen when MySQL knows that going through the sorting index will be more efficient to quickly track the result, when compared to working with other indexes (or scanning the full table).

Also, in some cases, it makes sense to also add the columns from the SELECT clause to the index, to create covering index. This is only relevant if the index isn't already 'too large'. What's too large? Well, no official rule of thumb here, but I usually go with up to 5 columns in an index. Creating a covering index allows the database to not only filter using the index, but to also fetch the information required by the SELECT clause directly from the index, which saves precious I/O operations for fetching the data of the filtered rows from the table's data.

Let's look at an example to clarify:

SELECT

id, first_name, last_name, age

FROM

employees

WHERE

first_name = 'John'

AND last_name = 'Brack'

AND age > 25

ORDER BY age ASC;

For this query, we'll start with adding the columns first_name and last_name, which are compared with an equality operator. Then, we'll add the age column which is compared with a range condition. No need to have the ORDER BY clause indexed here, as the age column is already in the index. Last but not least, we'll add id from the SELECT clause to the index to have a covering index.

So to index this query properly, you should create the index:
employees (first_name, last_name, age, id).

The above is a very simplified pseudo-algorithm that will allow you to build simple indexes for rather simple SQL queries.

When determining which columns from the JOIN ON clauses to index, you should keep in mind that MySQL has an algorithm to detect which table you should start with in the INNER JOIN, which also determines which column to index first. More information and a pseudo-algorithm can be found on MySQL's documentation.

What not to do when indexing (or writing SQL queries)?

We gathered some of the most common mistakes we see programmers and database administrators do when writing queries and indexing their tables.

Indexing each column in the table separately

In most cases, MySQL won't be able to use more than one index for each table in the query (excluding very specific cases of index merges).

Therefore, when creating a separate index for each column in the table, the database is bound to perform only one of the search operations using an index, and the rest of them will be significantly slower, as the database can't use an index to execute them.

We recommend using compound (multi-column) indexes wherever appropriate, rather than single-column indexes for each column individually.

The OR operator in filtering conditions

Consider this query:

SELECT

a, b

FROM

tbl

WHERE

a = 3 OR b = 8;

In many cases, MySQL won't be able to use an index to apply an OR condition, and as a result, this query is not index-able.

Therefore, we recommend to avoid such OR conditions and consider splitting the query to two parts, combined with a UNION DISTINCT (or even better, UNION ALL, in case you know there won't be any duplicate results)

The order of columns in an index is important

Let's say I hand you my contacts phone book which is ordered by the contact's first name and ask you to count how many people are there named "John" in the book. You'll grab the book in both hands and say "no problem". You will navigate to the page that holds all names starting with John, and start counting from there.

Now, let's say I change the assignment and hand you a phone book that is ordered by the contact's last name, but ask you to still count all contacts with the first name "John". How would you approach that? Well, the database scratches his head in this situation as well.

Now let’s look at an SQL query to demonstrate the same behaviour with the MySQL optimizer:

SELECT

first_name, last_name

FROM

contacts

WHERE

first_name = 'John';

Having the index contacts (first_name, last_name) is ideal here, because the index starts with our filtering condition and ends with another column in the SELECT clause.

But, having the reverse index contacts (last_name, first_name) is rather useless, as the database can't use the index for filtering, as the column we need is second in the index and not first.

The conclusion from this example is that the order of columns in an index is rather important.

Redundant indexes

Indexes are magnificent when trying to optimize your SQL queries and they can improve performance significantly.

But, they come with a downside as well. Each index you're creating should be kept updated and in sync when changes occur in your databases. So for each INSERT / UPDATE / DELETE in your databases, all relevant indexes should be updated. This update can take some time, especially with large tables / indexes.

Therefore, do not create indexes unless you know you'll need them.

Tuesday, November 24, 2020