SQL Execution plan deep dive (04-Spool)

Note: This blog is continuation to earlier 3 blogs
1.       SQL Execution plan deep dive (01-Basics I)
2.       SQL Execution plan deep dive (02-Basics II)
3.       SQL Execution plan deep dive (03-Scan, Key Lookup)

In simple terms spools are like storing the data in hidden temporary tables and the data being used repeatedly. Example, if there is a lookup on sequence of productID values spool can be used to pull bunch of records at the same time to reduce the number of Subtree executions.

Following query can be executed to explain few concepts of Spool!

--prep -- create a table of non-distinct product IDs
SELECT ProductId
INTO #products
FROM bigProduct
CROSS APPLY
(
SELECT 1
UNION ALL
SELECT 2
WHERE ProductId % 5 = 0
UNION ALL
SELECT 3
WHERE ProductId % 7 = 0
) x(m)
WHERE ProductId BETWEEN 1001 AND 12001;
GO

--The dreaded "performance spool"
SELECT p.ProductId,
AVG(x.ActualCost) AS AvgCostTop40
FROM #products AS p
CROSS APPLY
(
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY p.ProductId ORDER BY t.ActualCost DESC) AS r
FROM bigTransactionHistory AS t
WHERE p.ProductId = t.ProductId
) AS x
WHERE x.r BETWEEN 1 AND 40
GROUP BY p.ProductId;
GO
--

QueryExecutionPlan_Spool

Referring to the above query execution plan, we see the request is coming from Nested loop 7,438 tiems out of which 5545 rebinds happen (Subtree is executed to pull the data) and 1893 Rewinds occurred (Data is pulled directly from Spool (Hidden temporary table)). In above example, 1893 subtree executions are avoided using this spool.

So Basically if the data is unique Spool is not efficient but if we have too much of repeatetive data Spool helps to improve the performance.

To Compare the data how Spool improves or affecting the performance, you can use 8690 traceflag (undocumented: do not use in production) to indicate optimizer not to use Spool…

Add duplicate data into #products table and see how performance is impacted!

-- Add additional duplicate values into #products table
INSERT #products SELECT * from #products

-- Select statement with Spool
SELECT p.ProductId,
AVG(x.ActualCost) AS AvgCostTop40
FROM #products AS p
CROSS APPLY
(
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY p.ProductId ORDER BY t.ActualCost DESC) AS r
FROM bigTransactionHistory AS t
WHERE p.ProductId = t.ProductId
) AS x
WHERE x.r BETWEEN 1 AND 40
GROUP BY p.ProductId;
OPTION (QUERYTRACEON 8690)

-- Select statement without Spool
SELECT p.ProductId,
AVG(x.ActualCost) AS AvgCostTop40
FROM #products AS p
CROSS APPLY
(
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY p.ProductId ORDER BY t.ActualCost DESC) AS r
FROM bigTransactionHistory AS t
WHERE p.ProductId = t.ProductId
) AS x
WHERE x.r BETWEEN 1 AND 40
GROUP BY p.ProductId;
--
This entry was posted in Execution Plan, Performance, Productivity, SQL, Uncategorized and tagged , , , . Bookmark the permalink.

Leave a comment