Deploying and Processing Azure AS
In the “Processing Tabular from Exotic Clients” blog, I said that if you Azure AS model retrieves data from an Azure database, such as Azure SQL Database, it should also be located in the same Azure data center for faster processing and to avoid data transfer fees between data centers. Let’s elaborate a bit on this and share some performance results from a real-life project.
We deployed two Azure AS instances to North Central and South Central Azure data centers, while the Azure SQL Database was deployed to the South Central data center. Here the stats for processing loading about 35 million rows in Tabular using the Full Process of the AS database. The Azure SQL Database was at P1 performance level and Azure AS was at S1 performance level.
- South Central: 12 minutes or about 50,000 rows/sec
- North Central: 19 minutes or about 30,000 rows/sec
As you can see, hosting Azure AS in a different center incurs not only data transfer fees but also increased processing times (more than 50% overhead). Your results will probably vary but just like with on-premises installations, it’s very important to fine tune your queries or SQL views. SSAS is capable of processing more than 100,000 rows/sec on a single thread. However, it can do this if rows from the data source arrive at that speed. If you use custom queries or SQL views, any join or inefficient code can potentially slow down processing times significantly. For example, I typically use SQL views to wrap fact tables to perform ad hoc calculations. Initially, for this project I used the T-SQL AT TIME ZONE clause to perform some time zone calculations in the view. As it turned out, AT TIME ZONE is very slow because it looks in the registry for information about when daylight savings time starts and ends, and needs to work out whether each individual value you’re converting is within that period or not. As a result, processing the model took 6 hours! Once I switched to using SWITCHOFFSET, it went down to 12 minutes.
To quickly test the database throughput, open SSMS (ideally in a VM in the same data region to ignore connection latency) and execute a query that selects a few million rows from a table (make sure to change the Discard Query Results option in the query properties to False to avoid SSMS running out of memory when loading all these rows from the grid and to ignore the time to visualize the data). Then divide the query execution time by the number of rows. This gives you the data retrieval throughput and that will determine how fast SSAS can read the source data.