Monday, April 12, 2021

SQL Server : SELECT all columns to be good or bad in database system

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

In this post, I am going to write about one of the most common mistakes of Database Developers.

“SELECT * ” (SELECT all columns) to be good or bad in a Database System.

I will discuss the pros and cons of “SELECT * ” and what should be our best practice with SELECT all columns.

Don’t go with ” SELECT * “:

When you are using ” SELECT * “, you are selecting more columns from the database, and your application might not use some of this column. This will create extra cost and load on the database system, and unnecessary data travels across the network.
When you are using JOIN between two or more tables, and in that query, you use “SELECT * “, guess about two columns with the same name. It will return more columns than you needed and will create a binding problem.
If you are using “SELECT * “, you may get all columns in arbitrary sequence, and if specified the column name, it will return in the specified order, and you can also refer to this column by numerical index.
If such a code exists in an application which requires all columns in a predefined order, you have to specify all the columns name in the correct order.
When you are using “SELECT *”, you will face performance issue because this is not advisable to apply an index on all the columns of the table.
When you are using “SELECT *”, you require to create documentation for those columns which are returning and using by an application otherwise, this will create confusion.
When you are using “SELECT * ” and in the future, someone adds one new column with TEXT data type, you can imagine performance and other errors.
If you need to select all columns, also I would suggest providing a full list of columns because putting “SELECT *” needs fetching the name of the columns from stored metadata or system information which will impact the query performance.
Now Imagine that you are using “SELECT *” in all your stored procedure and now you require to find one column name from the text of the stored procedure which is not possible with “SELECT *”.

Why you go with “SELECT *” ?

The Lazy developers can only go with “SELECT *”.
If you have special requirements and create a dynamic environment when add or delete column automatically handle by application code. In this particular case, you don’t require to change application and database code, and this will automatically be done in a production environment. In this case, you can use “SELECT *”.
I didn’t find any other reason to use “SELECT *”.

SQL Server: Basic performance test between SELECT INTO and INSERT INTO SELECT

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

In this post, I am doing a basic performance test between SELECT INTO and INSERT INTO SELECT option of SQL Server.

Database Developers are using these options very frequently for copying data between the tables. But SELECT INTO is little faster than INSERT INTO SELECT option.

Note: I am testing using a very small number of sample records. If you are testing with a huge number of records, you may find a big difference in the performance.

Please check the below examples:

Set Statistics:

1
2
SET STATISTICS IO ON
SET STATISTICS TIME ON

Create sample tables:

1
2
3
4
5
CREATE TABLE tbl_MyData (ID INT, Name VARCHAR(10))
CREATE TABLE tbl_Mytable (ID INT, Name VARCHAR(10))
 
INSERT INTO tbl_MyData VALUES (1,'dbrnd')
GO 1000000

Perform INSERT INTO SELECT:

1
2
3
INSERT INTO  tbl_Mytable 
SELECT ID, Name 
FROM tbl_MyData

Execution time:

1
2
SQL Server Execution Times:
CPU time = 16 ms,  elapsed time = 37 ms.

Perform SELECT INTO:

1
2
3
SELECT ID, Name 
INTO tbl_MyTable2
FROM tbl_MyData

Execution time:

1
2
SQL Server Execution Times:
CPU time = 16 ms,  elapsed time = 21 ms.

Thursday, February 11, 2021

SQL Server : SQL Server Index Terms

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

SQL Server Index Terms

Let’s have a chat about your indexes in SQL Server. Indexes can be confusing. They are the easiest way to boost performance of queries, but there are so many options and terms around indexes that it can be hard to keep track of them. Let’s chat about the differences between clustered and Nonclustered indexes, key columns and nonkey columns, and covering and non-covering indexes.

Clustered vs. Nonclustered Indexes

Clustered index: a SQL Server index that sorts and stores data rows in a table, based on key values.

The basic syntax to create a clustered index is

1	CREATE CLUSTERED INDEX Index_Name ON Schema.TableName(Column);

Nonclustered index: a SQL Server index which contains a key value and a pointer to the data in the heap or clustered index.

The basic syntax for a Nonclustered index is

1	CREATE INDEX Index_Name ON Schema.TableName(Column);

The difference between clustered and Nonclustered SQL Server indexes is that a clustered index controls the physical order of the data pages. The data pages of a clustered index will always include all the columns in the table, even if you only create the index on one column. The column(s) you specify as key columns affect how the pages are stored in the B-tree index structure. A Nonclustered index does not affect the ordering and storing of the data.

A B-tree structure has at least two levels: the root and the leaves. If there are enough records, intermediate levels may be added as well. Clustered index leaf-level pages contain the data in the table. Nonclustered index leaf-level pages contain the key value and a pointer to the data row in the clustered index or heap.

hand-drawn representation of a b-tree.
fine art, folks.

There are a few limits to indexes.

There can be only one clustered index per table.
SQL Server supports up to 999 nonclustered indexes per table.
An index – clustered or nonclustered – can be a maximum of 16 columns and 900 bytes.

These are limits, not goals. Every index you create will take up space in your database. The index will also need to be modified when inserts, updates, and deletes are performed. This will lead to CPU and disk overhead, so craft indexes carefully and test them thoroughly.

Primary Key as a Clustered Index

Primary key: a constraint to enforce uniqueness in a table. The primary key columns cannot hold NULL values.

In SQL Server, when you create a primary key on a table, if a clustered index is not defined and a nonclustered index is not specified, a unique clustered index is created to enforce the constraint. However, there is no guarantee that this is the best choice for a clustered index for that table. Make sure you are carefully considering this in your indexing strategy.

Key vs. Nonkey Columns

Key columns: the columns specified to create a clustered or nonclustered index.

Nonkey columns: columns added to the INCLUDE clause of a nonclustered index.

The basic syntax to create a nonclustered index with nonkey columns is:

1	CREATE INDEX Index_Name ON Schema.TableName(Column) INCLUDE (ColumnA, ColumnB);

A column cannot be both a key and a non-key. It is either a key column or a non-key, included column.

The difference lies in where the data about the column is stored in the B-tree. Clustered and nonclustered key columns are stored at every level of the index – the columns appear on the leaf and all intermediate levels. A nonkey column will only be stored at the leaf level, however.

There are benefits to using non-key columns.

Columns can be accessed with an index scan.
Data types not allowed in key columns are allowed in nonkey columns. All data types but text, ntext, and image are allowed.
Included columns do not count against the 900 byte index key limit enforced by SQL Server.

Covering Indexes

Covering index: all columns returned in a query are in the index, so no additional reads are required to get the data.

A covering index will reduce the IO operations, and improve performance of queries.

Let’s create a query and compare two indexes. I’m creating these on the Product.Products table in the AdventureWorks2012 database.

The query we want to use is

SELECT ProductID, Name, ProductNumber, Color
FROM dbo.Products
WHERE Color = 'Black';

The first index is nonclustered, with two key columns:

1	CREATE INDEX IX_Products_Name_ProductNumber ON dbo.Products(Name, ProductNumber);

The second is also nonclustered, with two key columns and three nonkey columns:

1 2	CREATE INDEX IX_Products_Name_ProductNumber_ColorClassStyle ON dbo.Products(Name, ProductNumber) INCLUDE (Color, Class, Style);

In this case, the first index would not be a covering index for that query. The second index would be a covering index for that specific query.

Indexes Can Be Magic, Except Whey They Aren’t

Indexes are the easiest thing to add to your database to boost performance. However, too much of a good thing can be bad. When designing a database, or troubleshooting poor performance, consider all your index options and carefully test them.

Monday, April 12, 2021

SQL Server : SELECT all columns to be good or bad in database system

Don’t go with ” SELECT * “:

Why you go with “SELECT *” ?

SQL Server: Basic performance test between SELECT INTO and INSERT INTO SELECT

Thursday, February 11, 2021

SQL Server : SQL Server Index Terms

SQL Server Index Terms

Clustered vs. Nonclustered Indexes

Primary Key as a Clustered Index

Key vs. Nonkey Columns

Covering Indexes

Indexes Can Be Magic, Except Whey They Aren’t

Featured Post

SQL Server : SELECT all columns to be good or bad in database system

Popular Posts