The Right Azure Disk Type for Distributed Database Workloads
Author: Richie Bachala
Originally published on Towards AI.
The Azure Storage Landscape
Azure offers several managed disk types, each designed for different workloads and performance requirements. We’ll focus on three key offerings:
- Premium SSD: The traditional performance-tier offering, providing consistent performance with burstable IOPS
- Premium SSD v2: A newer generation offering higher performance and more flexible scaling
- Ultra SSD: Azure’s highest-performance offering with configurable IOPS and throughput
Each of these options presents different performance characteristics and price points, making the choice non-trivial for database workloads.
Understanding Distributed Database Workloads
Before diving into performance numbers, it’s essential to understand what makes distributed database workloads unique. Unlike traditional single-node databases, distributed databases like YugabyteDB handle data differently:
- Write Operations:
- Require consensus across multiple nodes
- Need to maintain consistency across replicas
- Often involve both WAL (Write-Ahead Log) and data file writes
- Read Operations:
- May contact multiple nodes depending on consistency requirements
- Utilize caching at various levels
- Can be affected by data locality
Benchmarking Methodology
To thoroughly evaluate storage performance, we need a comprehensive testing approach. We employed two industry-standard benchmarking tools:
- TPC-C Benchmark
- Sysbench
Key Findings and Recommendations
Based on our comprehensive testing, we can make several recommendations:
- For Read-Heavy Workloads:
- Premium SSD v2 provides the best balance of performance and cost. The performance gap between Premium SSD v2 and Ultra SSD is minimal for read operations, making Ultra SSD harder to justify purely for read performance.
- For Write-Heavy Workloads:
- Ultra SSD shows its value in write-intensive scenarios, particularly with larger datasets. The consistent performance and lower latencies can justify the higher cost for write-critical applications.
- For Mixed Workloads:
- Premium SSD v2 emerges as the most cost-effective option for most mixed workloads. The performance improvements over Premium SSD are significant, while the cost remains lower than Ultra SSD.
Conclusion
Our testing reveals that Azure disk performance isn’t simply about raw IOPS and throughput numbers. The interaction between storage and distributed database workloads is complex, with CPU often becoming the limiting factor before storage performance is fully utilized.
For most distributed database deployments, Premium SSD v2 provides the sweet spot of performance and cost.
Ultra SSD becomes compelling primarily for:
- Write-heavy workloads with strict latency requirements
- Large datasets with unpredictable access patterns
- Mission-critical applications requiring consistent performance
When selecting Azure disk types for your distributed database, consider:
- Your workload characteristics (read/write ratio)
- Dataset size and growth expectations
- Performance requirements and budgetary constraints
- The actual bottlenecks in your current system