Azure sql database elastic pool split instance problem is a very rare phenomena which could impact resource usage metrics of elastic pool.
In this post I will cover how resource metrics are impacted, sample monitor, example of an impact.
How resource metrics are impacted:
Master..elastic_pool_resource_stats view indicates resource usage metrics of all elastic pools in the server and for each elastic pool each row indicates resource utilization metrics for every 15 seconds.
Any given point in time must be represented in a single row but when the pool is impacted there will be 2 rows indicating different metric values for same point in time.
In below example, impacted/unhealthy elastic pool has 2 rows covering point in time 2019/05/17 06:33:20; One of them indicating 61.76% storage utilization and other indicating 0.06% of storage utilization.
select top 100 *
WHERE end_time > dateadd(dd,-7,getdate())
AND ( elastic_pool_name = 'SampleElasticPool')
order by start_time desc
Impacted elastic pool metrics sample:
Expected healthy pool metrics Sample:
Validate number of metrics collected against each elastic pool every 15 minutes. There should be 60 records every 15 minutes, when the pool is impacted we see ~120 or more metrics every 15 minutes.
— Execute followign SQL against master database, to monitor impact in last 2 hours
;WITH CTE1 AS
(select elastic_pool_name, datepart(month,end_time) * 24 * 31 * 4 + datepart(dd,end_time) * 24 * 4 + datepart(hh,end_time) *4 + datepart(mi,end_time) / 15 as TimeID, max(end_time) as MaxDateTime, count(*) as MetricCountin15Minutes
WHERE end_time > dateadd(hh,-2,getdate())
group by elastic_pool_name, datepart(month,end_time) * 24 * 31 * 4 + datepart(dd,end_time) * 24 * 4 + datepart(hh,end_time) *4 + datepart(mi,end_time) / 15
select * from CTE1
where MetricCountin15Minutes > 100
ORDER BY elastic_pool_name,TimeID DESC
Example of impact:
Imagine an automated process which validates available storage and adds additional databases to elastic pool. If automated process executes at 2019/05/17 06:33:25, adds more databases assuming only 0.06% storage is used but 62% of storage is utilized on the pool resulting unexpected behavior.
This problem is related to metric collection only, Databases will be available to use during the impact.
This phenomena is common during reconfiguration of elastic pools but will disappear very quickly, most of the time with 0 impact.
If the issue occurs for hours or days, it is beneficial to engage Microsoft.
The possibility of this issue occurrence is very low and impact is very low unless you have more automation in place.
This issue was first identified in 2017, Microsoft fixed it but resurfaced in mid-2019 at least once.
Most frequently used Azure SQL database commends: