A Gateway to Hell

“Life is suffering and suffering is unavoidable”. Buddha must have dealt with data gateways. Two “gotchas” that surfaced recently after countless hours of troubleshooting:

  1. A bug in the data gateway AD lookup cache – In my “Gateway AD Lookups” post, I described a very useful gateway feature to look up the on-prem user identity from a cloud identity so that you don’t have to create explicit mappings. The gateway maintains an internal dictionary cache so that it doesn’t have to look up Windows Active Directory for every query. This cache can’t be disabled. The issue is that If you have an older gateway build, e.g. December 2019, there is a bug (and a security vulnerability) with concurrent access to cache which may cause the gateway to map the interactive user to a wrong AD user under heavy load. The issue was fixed in a later build, but the fix introduced another issue by imposing a very coarse lock (probably the entire cache) which may reduce concurrency with many users running reports. In other words, under a heavy load user requests queue up to wait for the cache lock to be released. This issue will be fixed in the next July 2020 gateway build.
  2. Beware stale data sources registered in the gateway – Now that we’ve addressed the concurrency bug, the same client reported a new issue where every hour or so users can’t render reports connected to the gateway for a couple of minutes and then the issue resolves itself. The culprit turned out to be neglected Analysis Services data sources registered in the gateway. By default, Power BI dashboards will query these connections every hour to cache tile data (the frequency can be changed on the dataset settings page, but it can’t be disabled). The problem was that the master account used to connect to Analysis Services changed its password. While the team updated the connections in the actively used data sources, it did not do so for the neglected data sources (no one is using them, right?). However, the dashboards still submit queries to update the tile cache and the gateway asks AD to authenticate the master account.  Then, the domain controller locks the master account for a few minutes after a certain number of attempts since it rejects the logon attempt with the old password. The solution was to remove unused data sources from the gateway.