EnjoY | Database Research And Development: February 2021

Thursday, February 11, 2021

SQL Server : SQL Server Index Terms

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

SQL Server Index Terms

Let’s have a chat about your indexes in SQL Server. Indexes can be confusing. They are the easiest way to boost performance of queries, but there are so many options and terms around indexes that it can be hard to keep track of them. Let’s chat about the differences between clustered and Nonclustered indexes, key columns and nonkey columns, and covering and non-covering indexes.

Clustered vs. Nonclustered Indexes

Clustered index: a SQL Server index that sorts and stores data rows in a table, based on key values.

The basic syntax to create a clustered index is

Nonclustered index: a SQL Server index which contains a key value and a pointer to the data in the heap or clustered index.

The basic syntax for a Nonclustered index is

The difference between clustered and Nonclustered SQL Server indexes is that a clustered index controls the physical order of the data pages.  The data pages of a clustered index will always include all the columns in the table, even if you only create the index on one column. The column(s) you specify as key columns affect how the pages are stored in the B-tree index structure. A Nonclustered index does not affect the ordering and storing of the data.

A B-tree structure has at least two levels: the root and the leaves. If there are enough records, intermediate levels may be added as well. Clustered index leaf-level pages contain the data in the table. Nonclustered index leaf-level pages contain the key value and a pointer to the data row in the clustered index or heap.

hand-drawn representation of a b-tree.
fine art, folks.

There are a few limits to indexes.

  • There can be only one clustered index per table.
  • SQL Server supports up to 999 nonclustered indexes per table.
  • An index – clustered or nonclustered – can be a maximum of 16 columns and 900 bytes.

These are limits, not goals. Every index you create will take up space in your database. The index will also need to be modified when inserts, updates, and deletes are performed. This will lead to CPU and disk overhead, so craft indexes carefully and test them thoroughly.

Primary Key as a Clustered Index

Primary key: a constraint to enforce uniqueness in a table. The primary key columns cannot hold NULL values.

In SQL Server, when you create a primary key on a table, if a clustered index is not defined and a nonclustered index is not specified, a unique clustered index is created to enforce the constraint. However, there is no guarantee that this is the best choice for a clustered index for that table. Make sure you are carefully considering this in your indexing strategy.

Key vs. Nonkey Columns

Key columns: the columns specified to create a clustered or nonclustered index.

Nonkey columns: columns added to the INCLUDE clause of a nonclustered index.

The basic syntax to create a nonclustered index with nonkey columns is:

A column cannot be both a key and a non-key. It is either a key column or a non-key, included column.

The difference lies in where the data about the column is stored in the B-tree. Clustered and nonclustered key columns are stored at every level of the index – the columns appear on the leaf and all intermediate levels. A nonkey column will only be stored at the leaf level, however.

There are benefits to using non-key columns.

  • Columns can be accessed with an index scan.
  • Data types not allowed in key columns are allowed in nonkey columns. All data types but text, ntext, and image are allowed.
  • Included columns do not count against the 900 byte index key limit enforced by SQL Server.

Covering Indexes

Covering index: all columns returned in a query are in the index, so no additional reads are required to get the data.

A covering index will reduce the IO operations, and improve performance of queries.

Let’s create a query and compare two indexes. I’m creating these on the Product.Products table in the AdventureWorks2012 database.

The query we want to use is

The first index is nonclustered, with two key columns:

The second is also nonclustered, with two key columns and three nonkey columns:

In this case, the first index would not be a covering index for that query. The second index would be a covering index for that specific query.

Indexes Can Be Magic, Except Whey They Aren’t

Indexes are the easiest thing to add to your database to boost performance. However, too much of a good thing can be bad. When designing a database, or troubleshooting poor performance, consider all your index options and carefully test them.


SQL Server : Star and Snowflake Schema in Data Warehouse with Examples

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

What is Multidimensional schema?

Multidimensional Schema is especially designed to model data warehouse systems. The schemas are designed to address the unique needs of very large databases designed for the analytical purpose (OLAP).

Types of Data Warehouse Schema:

Following are 3 chief types of multidimensional schemas each having its unique advantages.

  • Star Schema
  • Snowflake Schema
  • Galaxy Schema

In this tutorial, you will learn more about-

What is a Star Schema?

Star Schema in data warehouse, in which the center of the star can have one fact table and a number of associated dimension tables. It is known as star schema as its structure resembles a star. The Star Schema data model is the simplest type of Data Warehouse schema. It is also known as Star Join Schema and is optimized for querying large data sets.

In the following Star Schema example, the fact table is at the center which contains keys to every dimension table like Dealer_ID, Model ID, Date_ID, Product_ID, Branch_ID & other attributes like Units sold and revenue.

Example of Star Schema
Example of Star Schema Diagram

Characteristics of Star Schema:

  • Every dimension in a star schema is represented with the only one-dimension table.
  • The dimension table should contain the set of attributes.
  • The dimension table is joined to the fact table using a foreign key
  • The dimension table are not joined to each other
  • Fact table would contain key and measure
  • The Star schema is easy to understand and provides optimal disk usage.
  • The dimension tables are not normalized. For instance, in the above figure, Country_ID does not have Country lookup table as an OLTP design would have.
  • The schema is widely supported by BI Tools

What is a Snowflake Schema?

Snowflake Schema in data warehouse is a logical arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. The dimension tables are normalized which splits data into additional tables.

In the following Snowflake Schema example, Country is further normalized into an individual table.

Example of Snowflake Schema
Example of Snowflake Schema

Characteristics of Snowflake Schema:

  • The main benefit of the snowflake schema it uses smaller disk space.
  • Easier to implement a dimension is added to the Schema
  • Due to multiple tables query performance is reduced
  • The primary challenge that you will face while using the snowflake Schema is that you need to perform more maintenance efforts because of the more lookup tables.

Star Schema Vs Snowflake Schema: Key Differences

Following is a key difference between Star Schema and Snowflake Schema:

Star SchemaSnowflake Schema
Hierarchies for the dimensions are stored in the dimensional table.Hierarchies are divided into separate tables.
It contains a fact table surrounded by dimension tables.One fact table surrounded by dimension table which are in turn surrounded by dimension table
In a star schema, only single join creates the relationship between the fact table and any dimension tables.A snowflake schema requires many joins to fetch the data.
Simple DB Design.Very Complex DB Design.
Denormalized Data structure and query also run faster.Normalized Data Structure.
High level of Data redundancyVery low-level data redundancy
Single Dimension table contains aggregated data.Data Split into different Dimension Tables.
Cube processing is faster.Cube processing might be slow because of the complex join.
Offers higher performing queries using Star Join Query Optimization. Tables may be connected with multiple dimensions.The Snowflake schema is represented by centralized fact table which unlikely connected with multiple dimensions.

What is a Galaxy Schema?

Galaxy Schema contains two fact table that share dimension tables between them. It is also called Fact Constellation Schema. The schema is viewed as a collection of stars hence the name Galaxy Schema.

Example of Galaxy Schema
Example of Galaxy Schema

As you can see in above example, there are two facts table

  1. Revenue
  2. Product.

In Galaxy schema shares dimensions are called Conformed Dimensions.

Characteristics of Galaxy Schema:

  • The dimensions in this schema are separated into separate dimensions based on the various levels of hierarchy.
  • For example, if geography has four levels of hierarchy like region, country, state, and city then Galaxy schema should have four dimensions.
  • Moreover, it is possible to build this type of schema by splitting the one-star schema into more Star schemes.
  • The dimensions are large in this schema which is needed to build based on the levels of hierarchy.
  • This schema is helpful for aggregating fact tables for better understanding.

What is Star Cluster Schema?

Snowflake schema contains fully expanded hierarchies. However, this can add complexity to the Schema and requires extra joins. On the other hand, star schema contains fully collapsed hierarchies, which may lead to redundancy. So, the best solution may be a balance between these two schemas which is Star Cluster Schema design.

Example of Star Cluster Schema
Example of Star Cluster Schema

Overlapping dimensions can be found as forks in hierarchies. A fork happens when an entity acts as a parent in two different dimensional hierarchies. Fork entities then identified as classification with one-to-many relationships.

Summary:

  • Multidimensional schema is especially designed to model data warehouse systems
  • The star schema is the simplest type of Data Warehouse schema. It is known as star schema as its structure resembles a star.
  • A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. It is called snowflake because its diagram resembles a Snowflake.
  • In a star schema, only single join defines the relationship between the fact table and any dimension tables.
  • Star schema contains a fact table surrounded by dimension tables.
  • Snowflake schema is surrounded by dimension table which are in turn surrounded by dimension table
  • A snowflake schema requires many joins to fetch the data.
  • A Galaxy Schema contains two fact table that shares dimension tables. It is also called Fact Constellation Schema.
  • Star cluster schema contains attributes of Star and Snowflake Schema.

Wednesday, February 3, 2021

MSSQL Server : Multiple ways we can do the update statements.

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

Multiple ways we can do the update statements.

 

In MSSQL Server  update statement we can write the different types.

See the below example.  

 

Created below the following tables.

 

-- Creating two tables.

 

Create  Table dbo.Table1(ID Int,Name VarChar(50));

GO

Create Table dbo.Table2(ID Int,Name VarChar(50));

 

GO

 

Insert Into dbo.Table1

Values(1,'A'),(2,'B'),(3,'C')

GO

Insert Into dbo.Table2

(ID,Name)

Values(1,'D'),(2,'E'),(3,'F')

 

GO

 

-- WITH Single Table

Update Table1 SET Name=(Select Name From Table2 T2 Where T2.ID=Table1.ID)

 

-- With Join Condition.

UPDATE T1 SET T1.Name=T2.Name

FROM dbo.Table1 T1 Inner JOin dbo.Table2 T2 On T1.ID=T2.ID

 

--Join Statement

Update Table1  SET Name=T2.Name

From dbo.Table1 T1 Inner Join dbo.Table2 T2 On T1.ID=T2.ID

 

--Using  Sub Query

Update Table1 SET Name=T2.Name

From (Select ID,Name

From Table2 ) T2 Where T2.ID=Table1.ID

 

 

--Clean up Tables

Drop Table dbo.Table1,dbo.Table2

Featured Post

SQL Server : SELECT all columns to be good or bad in database system

This article is half-done without your Comment! *** Please share your thoughts via Comment *** In this post, I am going to write about one o...

Popular Posts