In Part 1 I covered how much data your SQL Server should be processing and the effects if it isn’t. Here we check to make sure that you’re achieving that speed. Let us continue learning about data starvation in this blog post. We will be continuing to see how Vexata helps our SQL Server run faster.
Are the CPUs consistently over 90% utilized? If not, then something is holding them back. This could be application or hardware issues, below we’ll attempt to eliminate hardware.
First, look at the % User Time in Performance Monitor. SQL Server runs in the user space and low %’s of time spent in the user space could signal the CPU is being held back. Like in the below screen grab, simulated on a server connected to a slow SAN, only about 14% of the CPU’s 84% (% Processor Time) was actually spent on SQL Server work.
Verify server hardware:
Transport is the highway between your CPUs and your storage and should be more than big enough for your number of cores. Per port speeds, one way:
- 10Gb NIC (iSCSI): ~550MB/s
- 8Gb HBA (Fibre Channel): ~800MB/s
- 16Gb HBA (Fibre Channel): ~1,600MB/s
- 32Gb HBA (Fibre Channel): ~3,200MB/s
Know your workloads:
Are your mission critical applications OLAP or OLTP?
OLAP:
Read block size | Avg. Disk Bytes/Read |
Write block size | Avg. Disk Bytes/Write |
Read latency | Avg. Disk sec/Read |
Write latency | Avg. Disk sec/Write |
Read bandwidth | Disk Reads Bytes/sec |
Write bandwidth | Disk Writes Bytes/sec |
Read and write bandwidth are the most important. The higher the bandwidth the faster reports will go, the more can run at the same time and the faster the data will load. Latency should still be low (<2ms), but not as low as the smaller batches in OLTP.
Test: Run a table scan on a large table and see how high the bandwidth goes. Limit the results returned with a filter like: “SELECT * FROM CUSTOMERS WHERE AGE < 0”. Monitor the bandwidth with PerfMon.
Analysis: The read bandwidth should sustain around your theoretical peak based on the number of logical cores as discussed in blog Part 1. Note: Using a table scan removes application based issues so full data throughput can be tested.
OLTP:
Read block size | Avg. Disk Bytes/Read |
Write block size | Avg. Disk Bytes/Write |
Read latency | Avg. Disk sec/Read |
Write latency | Avg. Disk sec/Write |
Read IOPs | Disk Read/sec |
Write IOPs | Disk Writes/sec |
Read and write latencies are the most important. The faster the CPUs can get back to processing the fewer time slices will be missed. IOPs is a measure of scale. How many operations can be executed.
Test: Use PerfMon to record the CPU utilization and LogicalDisks’ IOPs and Latency for the data, log and tempdb volumes.
Analysis: During peak user workloads, is the CPU utilization consistently over 90%? If not, what are the latencies? Lowering latencies get the CPU back to work faster, reduces wasted time slices and increases CPU utilization.
Common latency ranges for 8k OLTP IOPs running at full production load:
Storage type | Read Latency | Write Latency |
Disk array | 5-10ms | 2-5ms |
SSD Flash Array (AFA) | 1-2ms | 1ms |
NVMe Flash Array (NVMe) | .2ms | .2ms |
Conclusion
SQL licenses can be the most expensive part of a database platform. If the CPUs are not consistently over 90% then investigate the situation. A common issue these days is data starvation in which the storage tier is vastly underpowered and leaving the CPUs underutilized.
Call to Action: Meanwhile, check out Vexata solutions for SQL Server.
Reference: Pinal Dave (https://blog.sqlauthority.com)
First appeared on Data Starvation – Balance Your SQL Server – Part 2