SQL Server Log Reuse Waits

2017-12-11T11:18:06+00:00 December 6th, 2013|SQL|

In my previous SQL Server Transaction Log post, I promised Paul Taylor I would write up some of the meanings returned by the “log_reuse_wait_desc”, so here you go Paul 🙂

Active Transactions

By far the simplest to summarise; active transaction log reuse wait reason means that there is an open transaction that is keeping the VLF active. You could potentially use the DBCC OPENTRAN and sys.dm_exec_sessions to help identify the source of the open transaction. DBCC OPENTRAN lists information about the oldest open transaction in the database. The most important nugget needed when investigating log growth caused by active transactions, is the Server Process ID (or SPID) that is running the transaction. For example, you might get something like this:

Transaction information for database 'MyDatabase'.
Oldest active transaction:
SPID (server process ID): 41
UID (user ID) : -1
Name : user_transaction
LSN : (10861:3200:1)
Start time : Dec 6 2013 8:57:26:017PM
SID : 0x0105000000000005150000002e86f8cbc457a001b905c7e95e040000
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Here, the connection with SPID 41 had a transaction running since 8PM on 6th December. While the output of DBCC OPENTRAN yields nothing that can identify “who” is running that transaction, the SPID can be used along with some of the DMVs to return the information you need.

SELECT host_name, program_name, original_login_name,  st.text
FROM sys.dm_exec_sessions es
INNERJOIN sys.dm_exec_connections ec
ON es.session_id = ec.session_id
CROSSAPPLY sys.dm_exec_sql_text(ec.most_recent_sql_handle) st
WHERE ec.session_id = 41

By using “original_login_name”, rather than just “login_name”, you avoid any confusion should impersonation be occurring. That should be enough information to make an educated decision whether to punch the offending user, terminate the connection, or to wait for the transaction to complete.

Replication

Transactional replication uses the transaction log to identify changes that need to be replicated to subscribers. Whenever a change is made to a replicated table, the log records associated with that change are marked as a “pending replication”. One of the components of the transactional replication process is the Log Reader Agent job. It scans the log looking for “pending transactions” and then, when it gets one, places the change into the distribution database while marking the log entry as “replicated”. When transactions are marked “pending replication” and the Log Reader Agent job has stopped, well it’s pretty obvious that those log records will never be marked as “replicated”, hence the VLFs containing these log records will never become inactive, and the log will just grow. The only kind of replication that uses the transaction log is transactional replication, and thus it is the only type of replication that can prevent log reuse.

I did a bit of research, because in SQL 2008 Enterprise edition, it’s a little bit different; the Change Data Capture feature uses the transaction log and the Log Reader Agent job in much the same way as transactional replication. Hence, if the CDC jobs are not running, the log space cannot be reused. The log reuse reason is exactly the same though. More good stuff about CDC in this TechNet article.

Mirroring

When database mirroring is configured, the Principal database sends log records to the mirror database. In asynchronous mirroring, the log records are placed into a log send queue when the transaction is committed on the Principal. In synchronous mirroring, this process occurs before the transaction is committed on the Principal. The log records will then be sent to the mirror at some point afterwards. If the mirror is running in asynchronous mode and the rate of transactions exceeds that which the link between the Principal and Mirror are able to handle, then such enough the log on the Principal will grow. Simply put, the log records cannot be marked inactive on the Principal until they have been sent to the Mirror. The same occurs in synchronous mirroring if the link between the Principal and Mirror drops and the mirroring becomes disconnected, or if the mirroring session is suspended. The log records on the Principal will remain active until the Mirror reconnects, or the session is resumed and the log records are copied over to it, or until the mirroring is dropped.

Checkpoint

Checkpoint log reuse waits is almost always a transient condition and in nearly all circumstances, it will occur when Simple recovery models are configured. SQL Server will automatically run checkpoints on a regular basis (don’t worry, this keeps the number of log records that it would need to process during a database recovery). In fact, there are many reasons why SQL runs checkpoints on your databases, refer to this MSDN article for a full breakdown. If you get this log reuse wait appearing quite frequently (or for prolonged periods) it could indicate that the I/O subsystem performance is inadequate for the volume of data you are attempting to write during the checkpoint or that there are very large numbers of changed pages that the checkpoint needs to process. Dust off your overtime sheet and get to work. Don’t mess around here, if your SQL environment is bursting at the seams, fix it before it’s too late.

Snapshot Creation

I’ve heard proper DBAs refer to this log reuse reason as the “short-lived transient type”. During the creation of database snapshots, SQL reads the source database log so that it can run crash recovery on the snapshot. This is to ensure that it is transactionally (yes I know that’s not a real word) consistent. This process means portions of the log cannot be marked as inactive. The DBCC CHECKDB query uses hidden database snapshots for its checks and thus waits may appear when running this query, even if there are no user-created snapshots. Prolonged snapshot creation waits can be a result of transactions running over a very long period that need to all be rolled back for the snapshot creation, or an inefficient reading of the log due to a very large number of VLFs.

Database Backup

You’re probably fairly comfortable now knowing that active transaction are necessary for database backups, as the backup has to include at least that much of the log to ensure a consistent restore can be performed. As such, long-running database backups could result in log growth. This is because portions of the log necessary for the backup must be retained for the duration of the backup. There is little you can do if this log reuse wait type is frequently encountered, other than optimising the backups themselves, such as; optimising I/O subsystems and / or striping the backups over multiple devices, converting to file-filegroup backups, or using compressed backups. In fact, anything that reduces the time taken to perform a backup will help.

Log Backup

Log backups is possibly one of the more commonly known reasons for log reuse being prevented. When the Full and Bulk-Logged recovery models are configured, VLFs cannot be marked as inactive until all the log records in that VLF have been backed up. If you were paying attention to the previous post, it goes without saying that this log backup wait reason will never appear in Simple recovery models. Should you receive this value, check that the log backups are correctly scheduled and are running successfully. You might also want to check that the NO_TRUNCATE option has not been specified, as that option is only for backing up the log of a damaged database and will not truncate any VLFs.

 

Thanks for reading and have a happy Friday!