Monitoring and Diagnostics Guide

Effective monitoring and diagnostics are crucial for maintaining the health, performance, and availability of your SQL Server instances. This guide covers essential tools, techniques, and best practices for identifying and resolving issues.

Key Areas of Monitoring

Tools for Monitoring and Diagnostics

SQL Server Management Studio (SSMS)

SSMS is the primary graphical tool for administering SQL Server. It provides several built-in features for monitoring:

Dynamic Management Views (DMVs) and Functions (DMFs)

DMVs and DMFs are powerful T-SQL objects that provide granular, real-time information about the SQL Server instance and its databases. They are invaluable for custom scripting and detailed analysis.

Commonly Used DMVs:


SELECT
    wait_type,
    wait_time_ms,
    signal_wait_time_ms,
    waiting_tasks_count
FROM
    sys.dm_os_wait_stats
WHERE
    wait_type NOT IN (
        'BROKER_EVENTHANDLER', 'BROKER_RECEIVE_WAITFOR', 'BROKER_TASK_STOP',
        'BROKER_TO_FLUSH', 'BROKER_TRANSMITTER', 'CHECKPOINT_QUEUE',
        'CHKPT', 'CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT', 'CLR_SEMAPHORE',
        'DBMIRROR_DBM_EVENT', 'DBMIRROR_WORKER_QUEUE', 'DBMIRRORING_CMD',
        'DIRTY_PAGE_POLL', 'DISPATCHER_QUEUE_SEMAPHORE', 'EXECSYNC',
        'FSAGENT', 'FT_IFTS_SCHEDULER_IDLE_WAIT', 'FT_IFTSHC_MUTEX',
        'HADR_BROKER_CONNECT', 'HADR_CMD_COMMIT', 'HADR_LOGCAPTURE_WAIT',
        'HADR_NOTIFICATION_DEFERRED', 'HADR_PRE_COMMIT_JOIN', 'HADR_QUIT',
        'HADR_SYNC_COMMIT', 'HADR_SYNCHRONIZATION_VIEWDEPENDENCY_EVENT',
        'HADR_WALLCLOCK_TIME', 'INDEX_ROW_LOCK', 'JOB_COMPLETION_QUEUE',
        'KDS_LOCK', 'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE', 'MEMORY_ALLOCATION_EXT',
        'MSQL_QUERY_TARGET', 'NETWORKIO_WAIT', 'PAGEIOLATCH_SH',
        'PAGEIOLATCH_UP', 'PWAIT_ALL_COMPONENTS_INITIALIZED',
        'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', 'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
        'REQUEST_FOR_DEADLOCK_SEARCH', 'RESOURCE_QUEUE', 'SERVER_IDLE_CHECK',
        'SLEEP_BPOOL_FLUSH', 'SLEEP_DBSTARTUP', 'SLEEP_DCOMSTARTUP',
        'SLEEP_MASTERDBREADY', 'SLEEP_MASTERMDREADY', 'SLEEP_MASTERUPGRADED',
        'SLEEP_MSDBSTARTUP', 'SLEEP_SYSTEMTASK', 'SLEEP_TASK',
        'SLEEP_TEMPDBSTARTUP', 'SNI_HTTP_ACCEPT', 'SP_SERVER_DIAGNOSTICS_SLEEP',
        'SQLTRACE_BUFFER_FLUSH', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
        'SQLTRACE_WAIT_ENTRIES', 'TRAN_COMPLETION_QUEUE',
        'WAIT_FOR_RESULTS', 'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', 'WAIT_XTP_CKPT_CLOSE',
        'XE_DISPATCHER_JOIN', 'XE_DISPATCHER_WAIT', 'XE_TIMER_EVENT'
    )
    ORDER BY
        wait_time_ms DESC;
        

Extended Events (XEvents)

Extended Events are a flexible and lightweight tracing system that replaces SQL Server Profiler. They allow you to capture specific events with minimal performance overhead.

Tip: Start with predefined templates (like system_health) and customize them as needed for your specific monitoring requirements.

Common Diagnostic Scenarios

High CPU Usage

Symptoms: Slow query responses, unresponsive application, high CPU utilization reported by the OS.

Troubleshooting Steps:

  1. Use Activity Monitor or DMVs (sys.dm_exec_requests, sys.dm_exec_sessions) to identify the most resource-intensive queries.
  2. Analyze query plans for performance bottlenecks (missing indexes, inefficient joins, table scans).
  3. Check for long-running or blocking queries.
  4. Ensure SQL Server has adequate CPU resources allocated.

Slow Query Performance

Symptoms: Specific queries taking an unexpectedly long time to complete.

Troubleshooting Steps:

  1. Capture the query using SQL Server Profiler or Extended Events.
  2. Analyze the execution plan.
  3. Check for missing or fragmented indexes.
  4. Review statistics for up-to-date information.
  5. Examine wait statistics (sys.dm_os_wait_stats) for clues (e.g., I/O waits, lock waits).

Disk I/O Bottlenecks

Symptoms: High disk latency, slow data retrieval and writing, poor performance during batch operations.

Troubleshooting Steps:

  1. Monitor disk performance counters in Performance Monitor or using sys.dm_os_performance_counters (e.g., `Physical Disk: Avg. Disk sec/Read`, `Avg. Disk sec/Write`).
  2. Check for file fragmentation (sys.dm_db_index_physical_stats).
  3. Ensure your storage subsystem is adequately provisioned for your workload.
  4. Consider moving data and log files to separate, faster storage.

Blocking and Deadlocks

Symptoms: Users reporting that their operations are stuck or failing with deadlock errors.

Troubleshooting Steps:

  1. Use Extended Events (specifically the xml_deadlock_report event) or trace deadlock graphs in Profiler.
  2. Analyze the deadlock graph to identify the statements and resources involved.
  3. Review application logic and transaction design for opportunities to reduce lock duration and contention.
  4. Ensure appropriate isolation levels are used.

Important: Deadlocks are a normal part of relational database systems. The goal is not to eliminate them entirely, but to minimize their frequency and impact.

Best Practices for Proactive Monitoring

Note: The system_health Extended Events session, enabled by default, captures important system-level information that can be very useful during initial troubleshooting.