Greatly Speeding Up A SQL Server Query
If your query is slow, you can check the execution plan for possible areas of speed up.
Well, I have done that and find that it does not always help.
I solved a slow query using a variety of techniques that I haven't seen mentioned in one place before and wanted to help anyone else in the same situation. This solution usually returned within 2 seconds.
Here is what I did.
Starting Query
This is a fairly basic query. It reports sales orders and allows the user to specify up to 6 optional where criteria.
If the user does not enter a criteria for a value, for example Country, its criteria string is set to
If the user does enter a criteria for a value, its criteria string is bracketed by '%..%'. For example, if the user enters 'Tin', strCountry is set to '%Tin%' and all countries with 'Tin' in its name are selected. (Argentina and Martinique for example.)
SELECT Top 1000
SalesHeader.InvoiceNumber
,SalesHeader.CompanyName
,SalesHeader.Street
,SalesHeader.City
,SalesHeader.Region
,SalesHeader.Country
,SalesHeader.SalesDate
,SalesHeader.InvoiceTotal
,SalesLineItem.LineItemNbr
,SalesLineItem.PartNumber
,SalesLineItem.Quantity
,SalesLineItem.UnitPrice
,SalesLineItem.Quantity * SalesLineItem.UnitPrice as ExtPrice
,PartMaster.UnitWeight
,SalesLineItem.Quantity * PartMaster.UnitWeight as ExtWeight
FROM dbo.SalesHeader
left join dbo.SalesLineItem on SalesHeader.InvoiceNumber = SalesLineItem.InvoiceNumber
left join dbo.PartMaster on SalesLineItem.PartNumber = PartMaster.PartNumber
where
(@strCountry = '' or Country like @strCountry)
and
(@strCompanyName = '' or CompanyName like @strCompanyName)
and
(@strPartNumber = '' or SalesLineItem.PartNumber like @strPartNumber)
and
(@strInvoiceNumber = '' or SalesHeader.InvoiceNumber like @strInvoiceNumber)
and
(@strRegion = '' or Region like @strRegion)
and
(@mnyExtPrice = 0 or (SalesLineItem.Quantity * SalesLineItem.UnitPrice)
> @mnyExtPrice)
Order By
SalesDate,
Country,
Company,
PartNumber
I am taking this from a data warehouse I worked on. There were 260,000 records in the full query. We limited the return to 1,000 records as a user would never want more than that.
What We Came Up With
There are different techniques to speed up a query. The following is our resulting query. I go over each of the techniques used below.
SELECT
InvoiceNumber
,Company
,Street
,City
,Region
,Country
,SalesDate
,InvoiceTotal
,LineItemNbr
,PartNumber
,Quantity
,UnitPrice
,ExtPrice
,UnitWeight
,ExtWeight
FROM
(
SELECT top 1000
IdentityID,
ROW_NUMBER() OVER (ORDER BY [SalesDate], [Country], [Company], [PartNumber]) as RowNbr
FROM dbo.SalesCombined with(index(NCI_SalesDt))
where
(@strCountry = '' or Country like @strCountry)
and
(@strCompany = '' or Company like @strCompany)
and
(@strPartNumber = '' or PartNumber like @strPartNumber)
and
(@strInvoiceNumber = '' or InvoiceNumber like @strInvoiceNumber)
and
(@strRegion = '' or Region like @strRegion)
and
(@mnyExtPrice = 0 or ExtPrice > @mnyExtPrice)
) SubSelect
Inner Join dbo.SalesCombined on SubSelect.IdentityID = SalesCombined.IdentityID
Order By
RowNbr
Technique 1 - Denormalize the data.
I was fortunate in two ways:
The data was small enough to create a second copy of it.
The data did not change very often.This meant I could structure the second copy optimized for querying and allow updating to take a while.
The SalesHeader, SalesLineItem and PartMaster tables were merged into the single SalesCombined table.
The calculated values were stored in the SalesCombined table as well.
Note that I left the original tables in place. All code to update those tables was still valid. I had to create additional code to then propagate the changes to the SalesCombined table.
Technique 2 - Created An Integer Identity Value
The first field of this denormalized table is an integer identity value. This was called IdentityID.
Even if we had not denormalized the data, an integer identity value in SalesHeader could have been used for the join between it and SalesLineItem and speeded the original query up a bit.
Technique 3 - Created A Clustered Index On This Integer Identity Value
I created a clustered index on this IdentityID value. This is the fastest way to find a record.
Technique 4 - Created A Unique, Non-Clustered Index On The Sort Fields
The query's output is sorted on four fields, SalesDate, Country, Company, PartNumber. So I created an index on these fields SalesDate, Country, Company and PartNumber.
Then I added the IdentityID to this index. This index was noted as Unique. This allowed SQL Server to go from the sort fields to the address, essentially, of the actual record as quickly as possible.
Technique 5: Include In the Non-Clustered Index All 'Where Clause' Fields
A SQL Server index can include fields that are not part of the sort. (Who thought of this? It's a great idea.) If you include all where clause fields in the index, SQL Server does not have to look up the actual record to obtain this data.
This is the normal look up process:
Read the index from disk.
Go to the first entry on the index.
Find the address of the first record from that entry.
Read that record from disk.
Find any fields that are part of the where clause and apply the criteria.
Decide if that record is included in the query.
If you include the where clause fields in the index:
Read the index from disk.
Go to the first entry on the index.
Find any fields that are part of the where clause and apply the criteria.
Decide if that record is included in the query.
CREATE UNIQUE NONCLUSTERED INDEX [NCI_InvcNbr] ON [dbo].[SalesCombined]
(
[SalesDate] ASC,
[Country] ASC,
[CompanyName] ASC,
[PartNumber] ASC,
[IdentityID] ASC
)
INCLUDE [InvoiceNumber],
[City],
[Region],
[ExtPrice]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]
The execution plan for the original query.
The execution plan for our final query is much simpler - to start, it just reads the index.
Technique 6: Created A Sub-Query To Find The IdentityID Of Each Record To Output And Their Sort Order
I created a sub-query to find the records to output and the order in which to output them. Note the following:
Technique 7 - It explicitly says to use the NCI_InvcNbr index that has all of the fields needed in it.
Technique 8- It uses the Row_Number function to generate an integer for each row that will be output. These values are generated 1, 2 and 3 in the order given by the fields in the ORDER BY section of that line.
Technique 9: Create An Enclosing Query With All Of The Values
This query specifies the values to print. It uses the Row_Number values to know the order in which to print. Note that the inner join is done on the IdentityID field which uses the Clustered index to find each record to print.
Techniques That Did Not Help
There were two techniques that we tried that did not speed up the query. These statements are both added to the end of a query.
OPTION (MAXDOP 1) limits the number of processors to one. This will prevent any parallelism from being done. We tried this when we were experimenting with the query and had parallelism in the execution plan.
OPTION (RECOMPILE) causes the execution plan to be recreated every time the query is run. This can be useful when different user selections can vary the query results.
Hope this can be of use.