Load Testing Azure Applications

Scenario: You’re conducting a capacity planning effort and load testing an Azure website (Azure App Service) that saves data to SQL Server running on Azure VM. You’ve created a load test, e.g. using Visual Studio, and you’ve realized that the throughput caps for no obvious reason. SQL Server and Azure performance counters indicate that both layers are not under pressure.

Solution: Although no specifics are provided by Microsoft, it appears that Azure throttles user loads to prevent denial-of-service attacks. Based on my observations, Azure detects that the requests come from the same IP and caps the requests. As a workaround, consider conducting the load test using Visual Studio Online. Visual Studio Online allows you to distribute the test across multiple agents that can use different IP addresses (configurable as a test property). In our case, this allowed us to conduct successfully a stepped load test until we reached a performance bottleneck. Interestingly, in this case the bottleneck we first reached was CPU utilization on the front end even although we scaled it to three large Azure web instances. SQL Server was churning along just fine with a single data disk. You results might differ of course.

Atlanta MS BI Group 5th Anniversary

Our next Atlanta MS BI Group meeting will be on August 31st. This happens to be our 5th anniversary! Thanks for everybody who contributed to the group success! We’ve come a long way since our first meeting with a dozen people to the attendance we enjoy today of 60-70+. Come and join the festivities and learn about sentiment analysis with Big Data & Machine Learning.

 

Presentation:

Tone/Sentiment analysis with Big Data & Machine Learning

  

Level: Intermediate

Date:

Monday, August 31th, 2015

Time

6:30 – 8:30 PM ET

Place:

South Terraces Building (Auditorium Room)

115 Perimeter Center Place

Atlanta, GA 30346

Overview:

People’s sentiments and opinions are written in social networks. There are tweets, Facebook posts, book reviews, forum discussions, and more. These attitudes and feelings are communicated using text, with format depending on the social network. Twitter messages are limited to 140 characters and use hash-tags and Facebook messages can be longer.

 

This session reviews the different Natural Language Processing, text mining, and data mining techniques you can use for sentiment and tone analysis. Organizations can use the extracted knowledge for brand reputation, market predictions, and automatic learning. We’ll look at, Hadoop, data mining, Microsoft Big Data Hadoop distribution HDInsight and Azure Machine Learning.

Speaker:

Paco Gonzalez, a SolidQ Mentor, is the PASS Big Data Virtual Chapter Leader, SQL Saturday Barcelona Organizer and assistant at the PASS Atlanta BI User Group. He is finishing his PhD thesis about: “Analyzing Social Data with Machine Learning.” He is an MCT and MCP on BizTalk Server and SQL Server. He is coauthor of Microsoft Training Kit 441. Paco is a frequent speaker at large and small conferences. SQL Saturdays, TechEds, PASS BA, PASS Summit, DevWeek London, and PAW Chicago and London.

Sponsor:

TEKSYSTEMS is a subsidiary of Allegis Group, the largest private talent management firm in the world. Our longstanding history and industry-leading position speak to our success in providing the IT staffing solutions, IT services and talent management insight required for you to actualize ROI and sustain a truly competitive advantage in a fast-changing market.

 

 

Offsetting Week Start Day

Scenario: You have a date dimension table and typical date-related columns, such as DayOfWeek, WeekEndDate, etc. In US, weeks start on Sunday and end on Saturday (ISO weeks on the other hand start on Monday and end on Sunday). You have a requirement to overwrite the week start and end days for a Power Pivot, Tabular, or MD model. For example, while the Date table defines the week start day on Sunday, you might need to overwrite it to start on Monday.

Solution: I typically derive as many of the date calculations in a SQL view on top of the Date table instead of saving the results in the table itself. For example, the WeekOfYear calculation might look like this:

DATEPART(WEEK, [Date]) [WeekOfYear]

If you have date calculations like this, instead of changing all week-related calculations to reflect the desired week start day, the easiest way to offset the week day is to use the T-SQL SET DATEFIRST statement preceding the actual SELECT statement. The following example taken from Power Pivot/Tabular overwrites the first day of the week to start on Monday.

Applied Power BI Service Class Available

I’m happy to announce the availability of my latest training course – Applied Power BI Service. I believe that the new version of Power BI, known as Power BI Service or Power BI 2.0, is going to be very successful and it should warrant your serious interest. This one to two-day class is designed to help you become proficient with the Power BI Service and acquire the necessary skills to work with online and on-prem data, implement data models on a par with professional models created by BI pros, unlock the power of data by creating interactive reports and dashboards, and share insights with other users. No prior data modeling or reporting knowledge is assumed. Students are welcome to bring their own data to the second day of the class.

Where Is My DQS KB Gone?

Scenario: You use the DQS Cleansing Transformation in an SSIS package to integrate data cleansing with DQS. Once you are done developing, you export the DQS KB and then create a new DQS KB by importing and publishing the KB to another server. Or, you might have deleted and recreated the KB with the same name. When you open the DQS Cleansing Transformation Editor, although the KB name is the same, you find that the reference to your DQS KB is gone and you get a warning message “The data quality Knowledge Base property is empty. Specify a valid Knowledge Base”. Naturally, you don’t want to remap the KB and recreate all mappings as this a tedious process involving many steps with many columns.

Solution: I’m not sure what design pattern was followed here, by SSIS identifies the KB by its identifier instead of by name. To fix the issue without remapping, you need to find the new KB ID and update the package source by following these steps:

  1. On the target SQL Server that is integrated with DQS, connect to the DQS_MAIN database and issue the following query by replacing the token with the name of your KB:
    SELECT * FROM [DQS_MAIN].[dbo].[A_KNOWLEDGEBASE] WHERE NAME = ‘<YOUR KB NAME>’
  2. In the query results, locate the ID column and copy the identifier.
  3. In SSDT, right-click on the package and then click View Code to open the package XML source.
  4. Find the knowledgebaseName property and overwrite the new identifier.

<property dataType=“System.Int64” name=“KnowledgebaseName” typeConverter=“NOTBROWSABLE”>1020033</property>

  1. Save the source and open the package in the designer.

Note that to resolve the KB identifier successfully, not only you need to import the KB on the target server but also publish the KB.

Power BI Desktop Supports Direct Connectivity to Tabular

As you’ve probably heard the news, Power BI went live yesterday. Among the many new features, Power BI Desktop (previously known as Power BI Designer) now supports live connections to Tabular. This is conceptually similar to connecting Excel to Tabular or Power View to Tabular to create ad hoc reports. However, Power BI Desktop is a standalone and freely available tool that is independent of Office or SharePoint.

When connected directly to Tabular, Power BI Desktop generates DAX queries as a result of report changes, sends them to Tabular and displays the results. Of course, we know we have new visualizations that don’t exist in Excel and on-prem Power View, such as tree maps, region maps, etc.

Upon discovering connectivity to Tabular, Power BI Desktop asks you how you want to connect:

Once you create the report, you can publish the dataset and report to Power BI. As a prerequisite, you need to install and configure the Analysis Services Connector to that the report can connect to Tabular on premises. If the connector is not installed, you will get an error on publishing. Speaking of the connector, at this time it cannot be installed side by side with the Personal Gateway which is used for refreshing data from on-prem data sources. You need to install them on separate machines.

Tabular is the first data source that supports direct connectivity in Power BI Desktop. I’d imagine that live connections to other data sources will follow soon. Direct connectivity is useful to support fast databases where there is no need to import and cache the data. In this case, you’ll need to weight out the pros and cons of direct connectivity vs. data import.

Atlanta BI Group Meeting on July 27th 2015

Come and join us for our next information-packed meeting on Monday, July 27th. Stacey Jones from UPS will present “Big Data as part of your Data Strategy” and Patrick LeBlanc from Microsoft will do a quick demo of Power BI Desktop now that Power BI has gone officially live. Our sponsor will be Prestige Staffing.

Presentation:

Big Data as part of your Data Strategy

  

Level: Intermediate

Date:

Monday, July 27th, 2015

Time

6:30 – 8:30 PM ET

Place:

South Terraces Building (Auditorium Room)

115 Perimeter Center Place

Atlanta, GA 30346

Overview:

New data technologies are evolving at an unprecedented pace. The traditional RDBMS is no longer the only choice for storing your data. Big data technologies such as Hadoop and other data technologies such as Data Virtualization, NoSQL, In-memory databases, Columnar Databases, and Graph databases offer new capabilities, but hold the potential to simply distract or disrupt business. Do relational databases still play a role? Using the wrong technology for the problem at hand can be expensive, time consuming, and counter-productive. I will discuss the importance of developing a data strategy for your company and how to determine the correct technologies. Takeaways:

 
 

•    Understand your data consumption needs

•    Document your data ecosystem

•    Develop your data strategy to apply the best data technology

Speaker:

Stacey Jones is a Data Architect filling the role of the UPS Chief Business Intelligence Evangelist. Stacey spends his time researching and promoting the use of modern BI- related technologies such as Big Data, Self-Service, Data Virtualization, Polyglot Persistence, In-Memory Databases and Predictive/Prescriptive Analytics. Stacey recently transitioned UPS from reporting to Self-Service Business Intelligence and architected the largest cube ever built at UPS. He is passionate about the art of writing and optimizing the performance of SQL queries, database design, database code, and all things BI. You can reach him at staceyjones@ups.com or on LinkedIn at https://www.linkedin.com/pub/stacey-jones/24/891/59

Prototypes with Pizza

Power BI Desktop with Patrick LeBlanc, Microsoft

Sponsor:

Prestige Staffing is an award-winning, nationally recognized leader staff augmentation. We specialize in providing elite candidates to fulfill job openings in the IT, Telecommunications, Healthcare and Accounting/Finance industries. Our clients include both Fortune 500 companies and small, local businesses. Our mission is to provide employers with only the most skilled applicants, while simultaneously helping qualified professionals improve their careers by placing them in the most ideal work situations.

Using DQS for Externalizing Data Validation Rules

Developers spend plenty of time coding for exceptions. The customer SSN must be 9 characters. The organization code must be four characters and it must be constrained to specific values. The happy path is easy, coding for exceptions is tedious. Some exceptions are easy to catch, others might require more complicated logic than implementing the happy path. And, it gets more difficult if the rules change often or your BI system needs to support adding new rules.

This is where Data Quality Services (DQS) can help. To me, the biggest strength of DQS is externalizing rules. Instead of hardcoding rules in SQL or ETL, you can externalize them and even offload rule management to business users. Using the Data Quality Client tool, a business user can define simple rules and corrections, such as that the e-mail address must match a specific regular expression. Then, the ETL developer can integrate the data flow with DQS to separate invalid and valid rows. Moreover, DQS will tell you exactly which rule is broken and a confidence factor for matching. This helps you avoid tedious coding and saves your company a lot of effort.

In its first release, DQS has left substantial areas for improvement which I hope SQL Server 2016 will address, including:

  1. I’m sure that there is a solid academic theory behind the “domains” concept, but requiring the user to create composite domains for simple-cross field validations might too much to ask. Instead, a better approach might be to have a hidden composite domain consisting of all fields.
  2. The lookup lists, such as “Value Is In” cannot be data-driven, requiring you to hardcode lookup values.
  3. More advanced rules are needed, such as IF-THEN, without requiring composite domains.
  4. Very slow performance with the SSIS DQS task because it uses a single thread.
  5. Currently, DQS has no API so automation is not possible.

Solving Datazen MDX Query Limitations

As I mentioned in a previous blog, Microsoft added Datazen to its arsenal of data visualization tools. Datazen, of course, comes with its own pros and cons. In this blog, you’ll see how you can get around a Datazen limitation when connected to an SSAS data model.

Scenario: You create a Datazen dashboard that sources data from an SSAS MD or Tabular model. As usual, you need a query that slices some measures by dimension attributes, such as the query below:

select {[Measures].[Sales Amount],[Measures].[Order Quantity]} on 0,

[Product].[Category].[Category].Members
on 1

from [Adventure Works] 

While this query works fine in SSRS, when Datazen runs the query it silently drops the dimension attributes from the result. You might attempt to rewrite the query as follows:

select [Product].[Category].[Category].Members * {[Measures].[Sales Amount],[Measures].[Order Quantity]} on 0

from [Adventure Works]

But then you get:

The query cannot be prepared. The query must have at least one axis. The first axis of the query should not have multiple hierarchies, nor should it reference any dimension other than the Measures dimension…        

Solution: The problem is that Datazen doesn’t support dimension attributes in the query resultset. This limitation is overly restrictive but there is a workaround by changing the query to return the desired dimension attributes as calculated members:

with
member CategoryName as
iif ([Measures].[Sales Amount]=0, null, [Product].[Category].CurrentMember.Member_Caption)

select {CategoryName, [Measures].[Sales Amount],[Measures].[Order Quantity]} on 0,

[Product].[Category].[Category].Members
on 1

from [Adventure Works]

Notice that the calculated member uses the Iif operator to check if a product category has data so that the query doesn’t return all categories whether they have data or not.

UPDATE 7/23/2015

Microsoft released an update that fixes this limitation. Now when you go through the wizard, there is a step that allows you to select which dimension properties you want in the results.

Power BI Custom Visuals Announced

Today Microsoft announced the general availability of Power BI on July 24th meaning that Power BI goes live on that date. Among the many new features mentioned in the blog, the one that piqued my interest is custom visuals. Developers might recall that SSRS has provided custom report items since version 2005, which I wrote about in the October 2006 issue of the MSDN Magazine. Similarly, now you can implement custom visualizations in Power BI! And, instead of writing .NET code, you can do so using JavaScript. This opens the path to implement custom visuals using any of the JavaScript-based visualization frameworks, such as D3, WebGL, Canvas, or SVG.

This will be huge as it will allow developers to extend the Power BI visualization capabilities using open-source visualization frameworks. Do you need some gadgets, like a bullet graph? With some coding wizardry, you can implement your own, such as by using the D3 Bullet Charts sample. Moreover, to speed up development, Microsoft has published the code of all of the existing Power BI visualizations on GitHub. Once you follow the instructions to install the Power BI Visuals Sample, you can open the project in Visual Studio and open in the browser index.html located in the PowerBIVisualsPlayground project. This allow you to test the visuals. The actual TypeScript code of the Microsoft visualizations is located in the Visuals project.
3833.custommap.jpg-400x0I think the Microsoft decision to make Power BI open will be a paradigm shift for the BI industry. Besides the Push API which are already published, Microsoft has already indicated plans to make Power BI even more open, such as to allow developers to embed reports in custom web applications. Now, if only Power BI supports on-prem installation …