I have been working with a client this week to address blocking and deadlocking on their main production server. This is a very busy server so blocking is to be expected. Currently the Blocked Processes Threshold is set to 5 and the client is unwilling to raise this. However at peak times of the year (like now), this can result in over 200 blocking alerts per week. From our point-of-view, most of this is noise and it impairs our ability to recognise and respond to genuine performance issues. I have been giving this a lot of thought, and believe the best way forwards will be to disable the alerting from our standard monitoring tools and develop a custom-monitoring solution for this client based on SQL Server Extended Events.
This is timely, as just that morning I had made a note to experiment with Extended Events (XE) in a sqlLab. I have been thinking about how we record deadlock information, which at the moment is through trace flags 1204 and 1222. The problem with these trace flags is that they can be resource-heavy and they log everything to the SQL Error Logs, which is an inconvenient mechanism for analysis. The advantage of XE is that it provides a targeted mechanism for recording events and stores its information either in-memory (ring-buffers) or in files. Here we explore using XE events to record deadlocking information in an XML format which is convenient for further analysis.
Resources for this lab
Before we begin this lab, there are some really great resources in the SQL Server community around XE and deadlocking.
- Brent Ozar & Jeremiah Peschka on Extended Events: This lab is derived from an article by Jeremiah Peschka, Finding Blocked Processes and Deadlocks Using SQL Server Extended Events, on BrentOzar.com.
- MSDN on XML in SQL Server: Of course, MSDN have very thorough documentation around Extended Events, as well as documentation on T-SQL / XML integration and how to query xml using t-sql. In essence, Microsoft have introduced xml-datatypes to SQL Server and these datatypes have a handful of methods (based on XPath / XQuery) which allows us to massage the data into shape.
- XQuery: Since this is essentially an exercise in XQuery, here are some great links to read up on: W3Schools Tutorial, W3.org reference on XQuery and MSDN’s excellent reference on XQuery.
Creating an Extended Event Session
As mentioned above, this sqlLab is based on the article by Jeremiah Peschka. However we will focus solely on deadlocking. While blocking and deadlocking are closely related, we want to be able to narrow down on the deadlocking events on our client’s environment. Blocking occurs with such regularity in this environment, that we want to be able to record it and contain these logs in their own space so that they don’t obscure other alerts / events.
Creating an Extended Event Session is easier than I thought. If you have SQL Server 2012 or upwards, you can use the GUI tools. Alternatively, the T-SQL to create an event session is surprisingly simple. First we need to create an event session, we then add any events of interest and define how we would like the data to be recorded:
-- create an event called sqllab_deadlockXE CREATE EVENT SESSION [sqllab_deadlockXE] ON SERVER -- add an event that fires whe a deadlock occurs ADD EVENT sqlserver.xml_deadlock_report ( ACTION( sqlserver.client_app_name , sqlserver.client_hostname , sqlserver.database_name ) ) -- save the data to file ADD TARGET package0.asynchronous_file_target ( SET filename = N'c:\sqltrace\XEventSessions\xeDeadlockingSessions\sqllab_deadlock_process.xel' , metadatafile = N'c:\sqltrace\XEventSessions\xeDeadlockingSessions\sqllab_deadlock_process.xem' , max_file_size=(65536) , max_rollover_files=5 ) WITH (MAX_DISPATCH_LATENCY = 5SECONDS); go
As you can see, there is not much to it. However, there are a number of things to be aware of. First, this event is still reliant on the server configuration parameter, Blocked Process Threshold. The event will only fire when the configured threshold has been breached. This is important for our client, who require the threshold to be set at 5:
sp_configure 'blocked process threshold', '5'; reconfigure
Second there are a number of options for recording the data, as specified by the ADD TARGET statement. Here we are using a flat file and I haven’t dug into the other options, but there are some really interesting looking targets including the ring-buffer and histogram targets. The various targets are documented on MSDN, SQL Server Extended Event Targets.
With our sqllab_deadlockXE event created, we can start it:
ALTER EVENT SESSION [sqllab_deadlockXE] ON SERVER STATE = START;
Note that you have to ensure the ‘c:\sqltrace\XEventSessions’ folder structure is in-place or you will get the following error:
Collecting Deadlock Information
Now that we have the sqllab_deadlockXE running, we can begin to collect deadlocking information. For this, we will jump into my test database, WorldOfMayhem where we have two tables, [ticktick] and [boom]:
use [WorldOfMayhem]; go create table ticktick ( x int not null primary key, y int, z varchar(12) ); create table boom ( x int, y int, z varchar(12) ); go create clustered index [idx_boom_coords] on boom ([x], [y]); create nonclustered index [idx_tick_coords] on ticktick (x) include ([y]); go insert into ticktick select 1, 1, 'mario' union select 2, 2, 'luigi' union select 3, 3, 'bowser' union select 4, 4, 'peach' union select 5, 5, 'toad'; insert into boom select 0, 0, 'baboom'; go
We want to create a deadlock. To do this we will create two transactions, tx_A and tx_B, which will each have opposing locks. A state diagram describing this scenario is shown in Figure 2:
The above scenario takes advantage of the READ COMMITTED isolation level. We can see that tx_A holds an exclusive lock on [boom], while tx_B has an exclusive lock on [ticktick]. We are able to ‘force’ the deadlock by attempting to update rows which are held by an exclusive lock. To do this open two query windows, in the first execute tx_A:
begin tran tx_A insert into boom select 5, 5, 'toad'; waitfor delay '00:00:10'; update ticktick set x = x + 10, y = y + 10 where x not in (select x from boom); commit
In the second query window, execute tx_B:
begin tran tx_B insert into ticktick select 6, 6, 'yoshi' union select 7, 7, 'Koopa'; update boom set x = x + 10, y = y + 10 where x not in (select x from ticktick); commit
Now if you wait a few seconds, tx_A terminates as the victim in this deadlock:
We have successfully created a deadlock between tx_A and tx_B, and through the extended event session, sqllab_deadlockXE, we have been able to capture information about this deadlock event. Let’s look at what we have captured.
Reviewing Deadlock Information
The information gathered by the sqllab_deadlockXE is written to file, which we are able to then extract in an XML format. We are using an XML format for two reasons: firstly, it provides a convenient format for traversal and analysis. Secondly, we can extract the deadlock graph as XML and view this graphically in SSMS. Below is the reporting query used to read in the deadlocking information from the XE files:
with xmlData as ( -- Read in the xe_deadlock file, returning the raw deadlock info as XML select cast(event_data as XML) as [data] from sys.fn_xe_file_target_read_file ( 'C:\sqltrace\XEventSessions\xeDeadlockingSessions\ssl_deadlock_process*.xel' , 'C:\sqltrace\XEventSessions\xeDeadlockingSessions\ssl_deadlock_process*.xem' , null , null ) ), xmlReport as ( -- extract informaton about the deadlock VICTIM using XQuery select data.value('(/event/data/value/deadlock/process-list/process/@lasttranstarted)', 'datetime') as [Event Time] , data.value('(/event/data/value/deadlock/resource-list/keylock/@objectname)', 'nvarchar(max)') as [object] , data.value('(/event/data/value/deadlock/process-list/process/@clientapp)', 'nvarchar(256)') as [Application] , data.value('(/event/data/value/deadlock/process-list/process/@hostname)', 'nvarchar(256)') as [Hostname] , data.value('(/event/data/value/deadlock/process-list/process/@loginname)', 'nvarchar(256)') as [Login] , data.value('(/event/data/value/deadlock/resource-list/keylock/@indexname)', 'nvarchar(256)') as [Index] , data.value('(/event/data/value/deadlock/process-list/process/@waittime)', 'int') as [Wait Time (ms)] , data.value('(/event/data/value/deadlock/resource-list/keylock/@mode)', 'char(1)') as [Lock Mode] , data.query('(event/data/value/deadlock)') as [Deadlock Graph] from xmlData ) select [Event Time] , parsename(object, 3) as [Database] , parsename(object, 2) as [Schema] , parsename(object, 1) as [Table] , [Application] , [Hostname] , [Login] , [Index] , [Wait Time (ms)] , [Lock Mode] , [Deadlock Graph] from xmlReport order by [Event Time] desc;
We have used CTEs here simply for clarity. The first CTE, xmlData, reads in the XE information as XML. The CTE xmlReport extracts the relevant information from the XML data. And finally a report is produced. As it is, this report is sufficient but there are two significant unresolved issues:
- reports information about the first process in the XML tree only. Ideally, we would use XPath to extract more detailed information about both the victim and the successful processes. However, this information is available in the deadlock graph.
- the extraction of [database], [schema], [object] information is hard-coded. In this case, it is extracting information from the XML <keylock> element. But if this was a <ridlock> instead, then it would not extract the object details. Similar to above, this could be resolved using XPath to extract the first-child of <resource-list>.
The above two issues aside, let’s look at what we get returned in Figure 3:
From this report we can quickly find information about the deadlock victim. We can see when the deadlock occurred, who it was run by and what caused the deadlock (here, a X lock on PK_ticktick). Even better, if you click on the Deadlock Graph it opens the full XML report of the deadlock:
<deadlock> <victim-list> <victimProcess id="process275630558" /> </victim-list> <process-list> <process id="process275630558" taskpriority="0" logused="248" waitresource="KEY: 10:72057594039304192 (b9b173bbe8d5)" waittime="3626" ownerId="168898" transactionname="tx_A" lasttranstarted="2014-07-06T16:32:42.583" XDES="0x275a116a8" lockMode="U" schedulerid="4" kpid="3308" status="suspended" spid="64" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2014-07-06T16:32:42.583" lastbatchcompleted="2014-07-06T16:32:42.580" lastattention="1900-01-01T00:00:00.580" clientapp="Microsoft SQL Server Management Studio - Query" hostname="XXXXXX" hostpid="4328" loginname="XXXXXX" isolationlevel="read committed (2)" xactid="168898" currentdb="10" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200"> <executionStack> <frame procname="adhoc" line="9" stmtstart="184" stmtend="356" sqlhandle="0x020000005dac9115a75ae7d1d002e0177d2c0556c3e84ca40000000000000000000000000000000000000000"> update ticktick set x = x + 10, y = y + 10 where x not in (select x from boom); </frame> </executionStack> <inputbuf> begin tran tx_A insert into boom select 5, 5, 'toad'; waitfor delay '00:00:10'; update ticktick set x = x + 10, y = y + 10 where x not in (select x from boom); commit </inputbuf> </process> <process id="process27e02d498" taskpriority="0" logused="592" waitresource="KEY: 10:72057594039435264 (05016e6b282b)" waittime="11340" ownerId="168923" transactionname="tx_B" lasttranstarted="2014-07-06T16:32:44.877" XDES="0x2756ea3a8" lockMode="U" schedulerid="4" kpid="4080" status="suspended" spid="65" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2014-07-06T16:32:44.877" lastbatchcompleted="2014-07-06T16:32:44.877" lastattention="1900-01-01T00:00:00.877" clientapp="Microsoft SQL Server Management Studio - Query" hostname="XXXXXX" hostpid="4328" loginname="XXXXXX" isolationlevel="read committed (2)" xactid="168923" currentdb="10" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200"> <executionStack> <frame procname="adhoc" line="8" stmtstart="190" stmtend="362" sqlhandle="0x020000009926b60ba6bbce4a9c0d90098d9f874dbd6c56270000000000000000000000000000000000000000"> update boom set x = x + 10, y = y + 10 where x not in (select x from ticktick); </frame> </executionStack> <inputbuf> begin tran tx_B insert into ticktick select 6, 6, 'yoshi' union select 7, 7, 'Koopa' update boom set x = x + 10, y = y + 10 where x not in (select x from ticktick); commit </inputbuf> </process> </process-list> <resource-list> <keylock hobtid="72057594039304192" dbid="10" objectname="WorldOfMayhem.dbo.ticktick" indexname="PK__ticktick__3BD019E5B922EBDE" id="lock2723b7a80" mode="X" associatedObjectId="72057594039304192"> <owner-list> <owner id="process27e02d498" mode="X" /> </owner-list> <waiter-list> <waiter id="process275630558" mode="U" requestType="wait" /> </waiter-list> </keylock> <keylock hobtid="72057594039435264" dbid="10" objectname="WorldOfMayhem.dbo.boom" indexname="idx_boom_coords" id="lock2723b8100" mode="X" associatedObjectId="72057594039435264"> <owner-list> <owner id="process275630558" mode="X" /> </owner-list> <waiter-list> <waiter id="process27e02d498" mode="U" requestType="wait" /> </waiter-list> </keylock> </resource-list> </deadlock>
If you have SQL Server 2012 +, then you can further save this deadlock report with the file extension, .xdl. Now, if you open this file in SSMS it renders the report graphically, as shown in Figure 4:
By using SQL Server’s Extended Events we have been able to capture detailed information about deadlocks in a way that allows us to collect the data in a central, easy-to-parse way. This is certainly an improvement on the trace flags 1204 and 1222. By taking advantage of T-SQL / XML integration we are able to query and analyse the captured information about deadlocks, and if you like the graphical representation the the ability to view .xdl file in SSMS is a huge advantage.
What we don’t have is the full detail that the trace flags provide, in particular the node information of 1204. However, there is sufficient information here to identify the relevant processes, the objects being fought over and the locking modes.