Note: This blog is continuation to earlier 3 blogs
1. SQL Execution plan deep dive (01-Basics I)
2. SQL Execution plan deep dive (02-Basics II)
3. SQL Execution plan deep dive (03-Scan, Key Lookup)
In simple terms spools are like storing the data in hidden temporary tables and the data being used repeatedly. Example, if there is a lookup on sequence of productID values spool can be used to pull bunch of records at the same time to reduce the number of Subtree executions.
Following query can be executed to explain few concepts of Spool!
--prep -- create a table of non-distinct product IDs SELECT ProductId INTO #products FROM bigProduct CROSS APPLY ( SELECT 1 UNION ALL SELECT 2 WHERE ProductId % 5 = 0 UNION ALL SELECT 3 WHERE ProductId % 7 = 0 ) x(m) WHERE ProductId BETWEEN 1001 AND 12001; GO --The dreaded "performance spool" SELECT p.ProductId, AVG(x.ActualCost) AS AvgCostTop40 FROM #products AS p CROSS APPLY ( SELECT t.*, ROW_NUMBER() OVER(PARTITION BY p.ProductId ORDER BY t.ActualCost DESC) AS r FROM bigTransactionHistory AS t WHERE p.ProductId = t.ProductId ) AS x WHERE x.r BETWEEN 1 AND 40 GROUP BY p.ProductId; GO --
Referring to the above query execution plan, we see the request is coming from Nested loop 7,438 tiems out of which 5545 rebinds happen (Subtree is executed to pull the data) and 1893 Rewinds occurred (Data is pulled directly from Spool (Hidden temporary table)). In above example, 1893 subtree executions are avoided using this spool.
So Basically if the data is unique Spool is not efficient but if we have too much of repeatetive data Spool helps to improve the performance.
To Compare the data how Spool improves or affecting the performance, you can use 8690 traceflag (undocumented: do not use in production) to indicate optimizer not to use Spool…
Add duplicate data into #products table and see how performance is impacted!
-- Add additional duplicate values into #products table INSERT #products SELECT * from #products -- Select statement with Spool SELECT p.ProductId, AVG(x.ActualCost) AS AvgCostTop40 FROM #products AS p CROSS APPLY ( SELECT t.*, ROW_NUMBER() OVER(PARTITION BY p.ProductId ORDER BY t.ActualCost DESC) AS r FROM bigTransactionHistory AS t WHERE p.ProductId = t.ProductId ) AS x WHERE x.r BETWEEN 1 AND 40 GROUP BY p.ProductId; OPTION (QUERYTRACEON 8690) -- Select statement without Spool SELECT p.ProductId, AVG(x.ActualCost) AS AvgCostTop40 FROM #products AS p CROSS APPLY ( SELECT t.*, ROW_NUMBER() OVER(PARTITION BY p.ProductId ORDER BY t.ActualCost DESC) AS r FROM bigTransactionHistory AS t WHERE p.ProductId = t.ProductId ) AS x WHERE x.r BETWEEN 1 AND 40 GROUP BY p.ProductId; --