Notes on Analysis Services Tabular Performance and Parallelism (Part 2)

In my “Notes on Analysis Services Performance and Parallelism” blog, I shared limited performance results about scaling up Azure Analysis Services Tabular. As I mentioned, I expect the same findings to apply to SQL Server Analysis Services Tabular for on-premises deployments. As the official load test results came in, I’m ready to provide more info and general recommendations. We load tested the AAS semantic model mentioned in the previous blog (about 2 billion rows) by capturing a representative subset of DAX queries including slow, medium and fast queries. We used a stepped load starting with one user and increased the number of concurrent users with one every 30 seconds. We considered the server saturated when 90% of maximum QPU was reached on average. For example, the S4 performance level has 400 QPU (20 cores) so the saturation level would be 360 QPU on average.

Note It takes a while for the Azure portal to refresh the graphs with metrics, with 30 seconds or more lag time being the norm. Also, you’ll probably find that one test controller is sufficient to saturate Tabular. Also, don’t focus too much on the number of concurrent users alone. In real life, you might not even have two concurrent users (user executing queries at the same time) but you might have a dashboard that runs queries in parallel, such as in the case of a Power BI report with multiple visuals. It all depends on what performance goal you’ve established prior to running the test. You established a performance goal right? If you don’t know how, my “Can Your BI Solution Scale” presentation should help. I personally prefer to express the goal as queries/sec instead of concurrent users as it’s very difficult to determine concurrency.

The results were consistent with my previous observations. As you can see in the chart below, in this case scaling up to a higher price tier increases the system throughput almost linearly.

S2 (200 QPU)S4 (400 QPU)S9 (640 QPU)
Number of cores102032
Max concurrent users4812
90% QPU capacity reached in time from test start2 min3.5 min6

To recap, scaling up Tabular would benefit two primary scenarios:

  1. When queries are storage engine bound – This would typically happen with larger models when significant time is spent scanning RAM.
  2. High concurrency – As you scale up and the number of cores increase, the system throughput will increase as well. Remember that you max out the highest tier (s9), you can easily scale out AAS as well, giving you the ability to scale almost linearly.

The scenarios that I don’t expect to benefit from scaling up fall into two general categories:

  1. Smaller models – Remember that Tabular allocates one core per a segment (the default segment size is 8 million rows). So, if you have a model of 100 million rows, the S2 level of parallelism (10 cores) might be good enough already. In this case, increasing the parallelism won’t help as the model just doesn’t have enough data to scan in parallel.
  2. Complex queries – Because formula engine is single threaded, formula-engine bound queries would be poor candidates for scaling up.

Plan for a load testing effort when in doubt if your semantic model would scale  Every model is different so take the above notes as general guidelines. Before scaling up, if possible, consider an alternative design to reduce number of rows and column data size and cardinality (if the queries are storage engine-bound), or optimize DAX queries relentlessly (if queries are formula engine-bound). Needless to say, fast storage doesn’t justify or compensate for a bad design or inefficient queries.