Greatly Speeding Up A SQL Server Query

If your query is slow, you can check the execution plan for possible areas of speed up.

Well, I have done that and find that it does not always help.

I solved a slow query using a variety of techniques that I haven't seen mentioned in one place before and wanted to help anyone else in the same situation. This solution usually returned within 2 seconds.

Here is what I did.

Starting Query

This is a fairly basic query. It reports sales orders and allows the user to specify up to 6 optional where criteria.

If the user does not enter a criteria for a value, for example Country, its criteria string is set to
If the user does enter a criteria for a value, its criteria string is bracketed by '%..%'. For example, if the user enters 'Tin', strCountry is set to '%Tin%' and all countries with 'Tin' in its name are selected. (Argentina and Martinique for example.)

SELECT Top 1000

SalesHeader.InvoiceNumber

,SalesHeader.CompanyName

,SalesHeader.Street

,SalesHeader.City

,SalesHeader.Region

,SalesHeader.Country

,SalesHeader.SalesDate

,SalesHeader.InvoiceTotal

,SalesLineItem.LineItemNbr

,SalesLineItem.PartNumber

,SalesLineItem.Quantity

,SalesLineItem.UnitPrice

,SalesLineItem.Quantity * SalesLineItem.UnitPrice as ExtPrice

,PartMaster.UnitWeight

,SalesLineItem.Quantity * PartMaster.UnitWeight as ExtWeight

FROM dbo.SalesHeader

left join dbo.SalesLineItem on SalesHeader.InvoiceNumber = SalesLineItem.InvoiceNumber

left join dbo.PartMaster on SalesLineItem.PartNumber = PartMaster.PartNumber

where

(@strCountry = '' or Country like @strCountry)

and

(@strCompanyName = '' or CompanyName like @strCompanyName)

and

(@strPartNumber = '' or SalesLineItem.PartNumber like @strPartNumber)

and

(@strInvoiceNumber = '' or SalesHeader.InvoiceNumber like @strInvoiceNumber)

and

(@strRegion = '' or Region like @strRegion)

and

(@mnyExtPrice = 0 or (SalesLineItem.Quantity * SalesLineItem.UnitPrice)

> @mnyExtPrice)

Order By

SalesDate,

Country,

Company,

PartNumber

I am taking this from a data warehouse I worked on. There were 260,000 records in the full query. We limited the return to 1,000 records as a user would never want more than that.

What We Came Up With

There are different techniques to speed up a query. The following is our resulting query. I go over each of the techniques used below.

SELECT

InvoiceNumber

,Company

,Street

,City

,Region

,Country

,SalesDate

,InvoiceTotal

,LineItemNbr

,PartNumber

,Quantity

,UnitPrice

,ExtPrice

,UnitWeight

,ExtWeight

FROM

(

SELECT top 1000

IdentityID,

ROW_NUMBER() OVER (ORDER BY [SalesDate], [Country], [Company], [PartNumber]) as RowNbr

FROM dbo.SalesCombined with(index(NCI_SalesDt))

where

(@strCountry = '' or Country like @strCountry)

and

(@strCompany = '' or Company like @strCompany)

and

(@strPartNumber = '' or PartNumber like @strPartNumber)

and

(@strInvoiceNumber = '' or InvoiceNumber like @strInvoiceNumber)

and

(@strRegion = '' or Region like @strRegion)

and

(@mnyExtPrice = 0 or ExtPrice > @mnyExtPrice)

) SubSelect

Inner Join dbo.SalesCombined on SubSelect.IdentityID = SalesCombined.IdentityID

Order By

RowNbr

Technique 1 - Denormalize the data.

I was fortunate in two ways:

The data was small enough to create a second copy of it.
The data did not change very often.This meant I could structure the second copy optimized for querying and allow updating to take a while.

The SalesHeader, SalesLineItem and PartMaster tables were merged into the single SalesCombined table.

The calculated values were stored in the SalesCombined table as well.

Note that I left the original tables in place. All code to update those tables was still valid. I had to create additional code to then propagate the changes to the SalesCombined table.

Technique 2 - Created An Integer Identity Value

The first field of this denormalized table is an integer identity value. This was called IdentityID.

Even if we had not denormalized the data, an integer identity value in SalesHeader could have been used for the join between it and SalesLineItem and speeded the original query up a bit.

Technique 3 - Created A Clustered Index On This Integer Identity Value

I created a clustered index on this IdentityID value. This is the fastest way to find a record.

Technique 4 - Created A Unique, Non-Clustered Index On The Sort Fields

The query's output is sorted on four fields, SalesDate, Country, Company, PartNumber. So I created an index on these fields SalesDate, Country, Company and PartNumber.

Then I added the IdentityID to this index. This index was noted as Unique. This allowed SQL Server to go from the sort fields to the address, essentially, of the actual record as quickly as possible.

Technique 5: Include In the Non-Clustered Index All 'Where Clause' Fields

A SQL Server index can include fields that are not part of the sort. (Who thought of this? It's a great idea.) If you include all where clause fields in the index, SQL Server does not have to look up the actual record to obtain this data.

This is the normal look up process:

Read the index from disk.
Go to the first entry on the index.
Find the address of the first record from that entry.
Read that record from disk.
Find any fields that are part of the where clause and apply the criteria.
Decide if that record is included in the query.

If you include the where clause fields in the index:

Read the index from disk.
Go to the first entry on the index.
Find any fields that are part of the where clause and apply the criteria.
Decide if that record is included in the query.

CREATE UNIQUE NONCLUSTERED INDEX [NCI_InvcNbr] ON [dbo].[SalesCombined]

(

[SalesDate] ASC,

[Country] ASC,

[CompanyName] ASC,

[PartNumber] ASC,

[IdentityID] ASC

)

INCLUDE [InvoiceNumber],

[City],

[Region],

[ExtPrice]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,

IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,

ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

ON [PRIMARY]

The execution plan for the original query.

The execution plan for our final query is much simpler - to start, it just reads the index.

Technique 6: Created A Sub-Query To Find The IdentityID Of Each Record To Output And Their Sort Order

I created a sub-query to find the records to output and the order in which to output them. Note the following:

Technique 7 - It explicitly says to use the NCI_InvcNbr index that has all of the fields needed in it.
Technique 8- It uses the Row_Number function to generate an integer for each row that will be output. These values are generated 1, 2 and 3 in the order given by the fields in the ORDER BY section of that line.

Technique 9: Create An Enclosing Query With All Of The Values

This query specifies the values to print. It uses the Row_Number values to know the order in which to print. Note that the inner join is done on the IdentityID field which uses the Clustered index to find each record to print.

Techniques That Did Not Help

There were two techniques that we tried that did not speed up the query. These statements are both added to the end of a query.

OPTION (MAXDOP 1) limits the number of processors to one. This will prevent any parallelism from being done. We tried this when we were experimenting with the query and had parallelism in the execution plan.
OPTION (RECOMPILE) causes the execution plan to be recreated every time the query is run. This can be useful when different user selections can vary the query results.

Hope this can be of use.

JOHN WILDANGER

Senior Business Systems Analyst | Project Manager

Greatly Speeding Up A SQL Server Query

QlikView - How To Structure Load Scripts

What the Business Systems Analyst is Working to Achieve

Greatly Speeding Up A SQL Server Query

Recent Posts

Kommentare