What is azure sql database elastic pool split instance problem

Azure sql database elastic pool split instance problem is a very rare phenomena which could impact resource usage metrics of elastic pool.

In this post I will cover how resource metrics are impacted, sample monitor, example of an impact.

How resource metrics are impacted:
Master..elastic_pool_resource_stats view indicates resource usage metrics of all elastic pools in the server and for each elastic pool each row indicates resource utilization metrics for every 15 seconds.
Any given point in time must be represented in a single row but when the pool is impacted there will be 2 rows indicating different metric values for same point in time.

In below example, impacted/unhealthy elastic pool has 2 rows covering point in time 2019/05/17 06:33:20; One of them indicating 61.76% storage utilization and other indicating 0.06% of storage utilization.

Sample SQL:

select  top 100 *
FROM [sys].[elastic_pool_resource_stats]
WHERE end_time > dateadd(dd,-7,getdate())
AND ( elastic_pool_name = 'SampleElasticPool')
order by start_time desc 


Impacted elastic pool metrics sample:

Expected healthy pool metrics Sample:

Sample Monitor:
Validate number of metrics collected against each elastic pool every 15 minutes. There should be 60 records every 15 minutes, when the pool is impacted we see ~120 or more metrics every 15 minutes.

— Execute followign SQL against master database, to monitor impact in last 2 hours

(select  elastic_pool_name, datepart(month,end_time) * 24 * 31 * 4 + datepart(dd,end_time) * 24 * 4 + datepart(hh,end_time) *4 + datepart(mi,end_time) / 15 as TimeID, max(end_time) as MaxDateTime, count(*) as MetricCountin15Minutes
FROM [sys].[elastic_pool_resource_stats]
WHERE end_time > dateadd(hh,-2,getdate())
group by elastic_pool_name,  datepart(month,end_time) * 24 * 31 * 4 + datepart(dd,end_time) * 24 * 4 + datepart(hh,end_time) *4 + datepart(mi,end_time) / 15
select * from CTE1 
where MetricCountin15Minutes > 100
ORDER BY elastic_pool_name,TimeID DESC 

Example of impact:
Imagine an automated process which validates available storage and adds additional databases to elastic pool. If automated process executes at 2019/05/17 06:33:25, adds more databases assuming only 0.06% storage is used but 62% of storage is utilized on the pool resulting unexpected behavior.

Additional information:
This problem is related to metric collection only, Databases will be available to use during the impact.
This phenomena is common during reconfiguration of elastic pools but will disappear very quickly, most of the time with 0 impact.
If the issue occurs for hours or days, it is beneficial to engage Microsoft.
The possibility of this issue occurrence is very low and impact is very low unless you have more automation in place.
This issue was first identified in 2017, Microsoft fixed it but resurfaced in mid-2019 at least once.

Most frequently used Azure SQL database commends:

This entry was posted in Azure, knowledge, Productivity, SQL, SQL Query, SQL Server, Uncategorized and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s