Atlanta BI fans, please join us for the next meeting on Tuesday, July 11th, at 6:30 PM ET. Stacey Jones (Cross Solution Architect at Microsoft) will join us in person to present OpenAI and Copilot. Your humble correspondent will demo the newly released PBI Desktop project format. For more details and sign up, visit our group page.
PLEASE NOTE A CHANGE TO OUR MEETING POLICY. EFFECTIVE IMMEDIATELY, WE ARE DISCONTINUING ONLINE MEETINGS VIA TEAMS. THIS GROUP MEETS ONLY IN PERSON. WE WON’T RECORD MEETINGS ANYMORE. THEREFORE, AS DURING THE PRE-PANDEMIC TIMES, PLEASE RSVP AND ATTEND IN PERSON IF YOU ARE INTERESTED IN THIS MEETING.
Presentation: Azure OpenAI – Answers to Your Natural Language Questions
Date: July 11th (Please note that because of the July 4th holiday, this meeting is on Tuesday)
Time: 18:30 – 20:30 ET
Level: Intermediate
Food: As of now, food won’t be available for this meeting. We are welcoming suggestions for a sponsor.
Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)
19:00-20:15 Main presentation
20:15-20:30 Q&A
ONSITE
Improving Office
11675 Rainwater Dr
Suite #100
Alpharetta, GA 30009
Overview: We will overview and explore Azure OpenAI, a fusion of OpenAI with the Azure cloud. Understand what OpenAI and CoPilot are as well as how they can help you with your natural language questions. We will also address questions about AI, such as will AI take over the world?
Speaker: Stacey Jones is a Cross Solution Architect specializing in all things Data, AI and Power BI. He is Evangelist at Microsoft’s Technology Center (MTC) in Atlanta, GA. He has 30+ years of industry experience in technology management spanning data architecture, data modeling, database development, tuning, and administration in both Oracle and SQL Server. He is active on LinkedIn and has plenty of real-life experience putting out database performance fires.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2023-07-05 07:41:242023-07-05 07:41:24Atlanta Microsoft BI Group Meeting on July 11th (Azure OpenAI – Answers to Your Natural Language Questions)
“Patience my tinsel angel, patience my perfumed child One day they’ll really love you, you’ll charm them with that smile But for now it’s just another Chelsea [TL: BI] Monday” Marrillion
Continuing our Power BI Fabric journey, let’s look at another of its engines that I personally care about – Fabric Warehouse (aka as Synapse Data Warehouse). Most of my real-life projects require integrating data from multiple data sources into a centralized repository (commonly referred to as a data warehouse) that centralizes trusted data and serves it as a source to Power BI and Analysis Services semantic models. Due to the venerable history of relational databases and other benefits, I’ve been relying on relational databases powered by SQL Server to host the data warehouse. This usually entails a compromise between scalability and budget. Therefore, Azure-based projects with low data volumes (up to a few million rows) typically host the warehouse in a cost-effective Azure SQL Database, while large scale projects aim for Synapse SQL Dedicated Pools. And now there is a new option on the horizon – Fabric Warehouse. But where does it fit in?
The Good
Why would you care about Fabric Warehouse given that Microsoft has three (and counting) Azure SQL SKUs: Azure SQL Database, Azure SQL Managed Instance, and Synapse SQL Dedicated Pools (based on Azure SQL Database)? Ok, it saves data in an open file format in OneLake (delta tables), but I personally don’t quite care much about how data is saved. We could say that SQL Server is also (although proprietary) a file-based engine that served us well for 30+ years. I care much more about other things, such as performance, budget, and T-SQL parity. However, as I mentioned in the previous Fabric-related blogs, the common delta format across Fabric enables data virtualization, should you decide to put all your eggs in one basket and embrace Fabric for all your data needs. Let’s say you have some files in the lakehouse managed zone. You can shortcut lakehouse delta tables to the Data Warehouse engine and avoid data movement. By contrast, previously you had to spin more ETL to move data from the Synapse serverless endpoint to the Synapse dedicated pools.
A big improvement is that you can also cross-query data warehouses and lakehouses, which wasn’t previously possible in Azure Synapse.
You can do this by writing familiar T-SQL queries with three-part-naming, such as querying a lakehouse table in your warehouse:
SELECT TOP (100) * FROM [wwilakehouse].dbo.dimension_customer
In fact, the elimination (or rather fusion) of Synapse Dedicated Pools and Serverless is a big plus for me. The infinite scale and instantaneous scalability that resulted from decoupling compute and store sounds interesting. I don’t know how Fabric Warehouse will evolve over time but I really hope that at some point, it will eliminate the afore-mentioned Azure SQL SKU decision for hosting relational data warehouses. I hope it will support the best of both worlds – the ability to dynamically scale (as serverless Azure SQL Database) up to the large scale of Synapse Dedicated Pools.
The Bad
Now that Azure SQL SKUs are piling up, Microsoft owes us serious benchmarks for Fabric Warehouse analytical and transactional loads.
Speaking of loads and considering that Fabric Warehouse saves data in columnar structures (Parquet files), this is not a tool to use for heavy OLTP loads. Columnar formats fit data analytics like a glove when data is analyzed by columns, but they will probably perform poorly for heavy transactional loads. Again, benchmarks could help determine where that cutoff point is.
Unlike Power BI Premium (P) capacities, Fabric capacities acquired by purchasing an Azure F plan can be scaled up or down (like Power BI Embedded A* plans). However, like A* plans, this requires manual intervention. I was hoping for dynamically scaling up and down workloads within a capacity to meet more demand and reduce cost. For example, I should be able to configure the capacity to auto-pause Fabric Warehouse when it is not in use and the data is imported in a Power BI dataset.
As I mentioned in my Fabric Lakehouse: The Good, The Bad, and the Ugly post, given that both saves data in the same format, I find the Lakehouse and Warehouse engines redundant and confusing. Currently, ADLS shortcomings are implicated for this separation and my hope is that at one point they will merge. If you are a SQL developer, you should be able to use SQL, and if you prefer notebooks for whatever reason, you can use Python or whatever language you prefer on the same data. This will be really nice and naturally resolve the lakehouse-vs-warehouse decision point.
The Ugly
Due to the fact that it’s completely rewritten, a work in progress, and therefore subject to many limitations, as it stands Fabric Warehouse is unusable for me.
After the botched Synapse saga, Microsoft had the audacity to call this offering the next generation of Synapse warehouse (in fact, the Fabric architecture labels is as Synapse Data Warehousing, I guess to save face) despite the fact that they pretty much start from scratch at least in terms of features. I call it “Oops, we did it again, overpromised and underdelivered, got lost in the game, Ooh, baby, baby…” I’ll give Fabric Warehouse a year or so before I revisit. If my hope for Fabric Warehouse as a more straightforward choice for warehousing materializes, I also wish Microsoft decouples it from Fabric and makes it available as a standalone Azure SKU offering. Which is pretty much my wish for all tools in the Fabric bundle. Bundles are great until you decide to opt out and salvage pieces…
Finally, after all the push and marketing hoopla, Synapse SQL Dedicated Pools seems to be on an unofficial deprecated path. I guess Microsoft has seen the writing on the wall from competing with other large-scale DW vendors. The current guidance is to consider them for high-scale performance while the “evolved” and “rewritten” Fabric Synapse Warehouse should be at the forefront for future DW implementations. And there is no migration path from Dedicated SQL Pools.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2023-07-03 13:42:312024-06-18 16:19:47Fabric Data Warehouse: The Good, The Bad, and the Ugly
“Hey, I got this lake house, you should see it It’s only down the road a couple miles I bet you’d feel like you’re in Texas I bet you’ll wanna stay a while” Brad Cox
Continuing our Fabric purview, Lakehouse is now on the menu. Let’s start with a definition. According to Databricks which are credited with this term, a data lakehouse is “a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data.” Further, their whitepaper “argues that the data warehouse architecture as we know it today will wither in the coming years and be replaced by a new architectural pattern, the Lakehouse, which will (i) be based on open direct-access data formats…” In other words, give us all of your data in a delta lake format, embrace our engine, and you can ditch the “legacy” relational databases for data warehousing and OLAP.
The Microsoft’s Lakehouse definition is less ambitious and exclusive. “Microsoft Fabric Lakehouse is a data architecture platform for storing, managing, and analyzing structured and unstructured data in a single location. It is a flexible and scalable solution that allows organizations to handle large volumes of data using a variety of tools and frameworks to process and analyze that data. It integrates with other data management and analytics tools to provide a comprehensive solution for data engineering and analytics”. In other words, a lakehouse is whatever you want it to be if you want something better than a data lake.
The Good
I like the Microsoft lakehouse definition better. Nobody forces me to adopt an architectural pattern that probably won’t stand the test of time anyway.
If I must deal with files, I can put them in the lakehouse unmanaged area. The unmanaged area is represented by the Files virtual view. This area somehow escapes the long-standing Power BI limitation that workspaces can’t have subfolders and therefore allows you to organize the files in any folder structure. And if I want to adopt the delta lake format, I can put my data in the lakehouse managed area, which is represented by the Tables virtual view.
Further, the Fabric lakehouse automatically discovers and registers delta lake tables created in the managed area. So, you can use a Power Query dataflow or ADF Copy Activity to write some data in a delta lake format and Fabric will register the table for you in the Spark metastore with the necessary metadata such as column names, formats, compression and more (you don’t have use Spark to register the table). Currently, the automatic discovery and registration is supported only for data written in the delta lake format.
You get two analytical engines for the price of one to query the delta tables in the managed area. Like Synapse Serverless, the SQL endpoint lets users and tools query the delta tables using SQL. And the brand new DirectLake mode in Analysis Services (more on this in a future post) lets you create Power BI datasets and reports that connect directly to the delta tables with blazing speed so you don’t have to import to get better performance.
Finally, I love the ability to shortcut lakehouse tables to the Data Warehouse engine and avoid data movement. By contrast, previously you had to spin more ETL to move data from Synapse serverless endpoint to the Synapse dedicated pools.
The Bad
As the most lake-like engine, Fabric lakehouse shares the same limitations as the Fabric OneLake which I discussed in a previous post.
Going quickly through the list, the managed area (Tables) can’t be organized in subfolders. If you try to paint outside the “canvas”, the delta tables will end up in an Unidentified folder and they won’t be automatically registered. Hopefully, this is a preview limitation since I don’t see you can implement a medallion file organization if that’s your thing. Further, while you can create delta tables from files in the unmanaged area, the tables are not automatically synchronized with changes to the original files. Automatic delta table synchronization could be very useful that I hope will make its way to the roadmap.
The Microsoft API wrapper that sits on top of OneLake is married to the Power BI security model. You can’t overwrite or augment the security permissions, such as by granting ACL permissions directly to the lakehouse folders. Even Shared Access Signature is not allowed. Storage account key is not an option too as you don’t have access to the storage account. So, to get around such limitations, you’d have to create storage accounts outside Fabric, which defeats the OneLake vision.
It’s unclear how Fabric would address DevOps, such as Development and Production environments. Currently, a best practice is to separate all services so more than likely Microsoft will enhance Power BI pipelines to handle all Fabric content.
The Ugly
Given that both the lakehouse and data warehouse (new Synapse) have embraced delta lake storage, it’s redundant and confusing to have two engines.
The documentation goes in length to explain the differences but why the separation? Based on what I’ve learned, the main reason is related to ADLS limitations to support important SQL features, including transactions that span multiple tables, lack of full T-SQL support (no updates, limited reads) and performance issues with trickle transactions. Hopefully, one day these issues will be resolved and the lakehouse and data warehouse will merge to give us the most flexibility. If you know SQL, you’d use SQL and if you prefer notebooks, you can use the Spark-supported languages to manipulate the same data.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2023-06-25 12:27:322023-09-01 12:42:18Fabric Lakehouse: The Good, The Bad, and the Ugly
To know a tool is to know its limitations – Teo’s proverb
In a previous post, I shared my overall impression of Fabric. In this post, I’ll continue exploring Fabric, this time sharing my thoughts on OneLake. If you need a quick intro to Fabric OneLake, the Josh Caplan’s “Build 2023: Eliminate data silos with OneLake, the OneDrive for Data” presentation provides a great overview of OneLake, its capabilities, and the vision behind it from a Microsoft perspective. If you prefer a shorter narrative, you can find it in the “Microsoft OneLake in Fabric, the OneDrive for data” post. As always, we are all learning and constructive criticism would be appreciated if I missed or misinterpreted something.
What’s Fabric OneLake?
In a nutshell, OneLake is a Microsoft-provisioned storage where the ingested data and the data from the analytical (compute) engines are stored (see the screenshot below). Currently, PBI datasets (Analysis Services) are not saved in OneLake although one could anticipate that in the long run all Power BI data artifacts could (and should) end up in OneLake for Power BI Fabric-enabled workspaces.
Behind the scenes, OneLake uses Azure Data Lake Storage (ADLS) Gen2, but there is an application wrapper on top of it that handles various tasks, such as management (you don’t have to provision storage accounts or external tables), security (Power BI security is enforced), and governance. OneLake is tightly coupled with the Power BI catalog and security. In fact, you can’t create folders outside the Power BI catalog so proper catalog planning is essential. When you create a Fabric workspace in Power BI, Microsoft provisions an empty blob container in OneLake for that workspace. As you provision additional Fabric Services, Fabric creates more folders and save data in these folders. For example, I can use Azure Storage Explorer to connect to a Fabric Lakehouse Tutorial workspace (https://onelake.blob.fabric.microsoft.com/fabric lakehouse tutorial) and see that I have provisioned a data warehouse called dw and a lakehouse called wwilakehouse. Microsoft has also added two additional system folders for data staging.
The Good
I like to have the data from the compute engines saved in one place and to be able to access that data as delta Parquet files. There is something serene to have this transparency after decades of not being able to pick under hood of some proprietary file format.
I also like very much the ability to use different engines to access that data (Microsoft calls this OneCopy although a more appropriate term in my opinion would be OneShare).
For example, I can query a table in the lakehouse directly from the data warehouse (such as select * from [wwilakehouse].dbo.dimension_customer) without having to create PolyBase external tables or use the serverless endpoint.
I welcome the possibility for better data virtualization both within and outside the organization. For example, in a lakehouse I can create shortcuts to other Fabric lakehouses, ADLS folders, and even external storage systems, such as S3 (Google and Dataverse coming up).
I’d like the governance aspect and the possibility it enables, such as lineage tracking and data security. In fact, OneSecurity is on the roadmap, where once you secure the data at OneLake, the data security bubbles up. It would be interesting to see how this will work and its limitations, as I’d imagine it won’t be as flexible as Power BI RLS.
The Bad
An API wrapper is a double-edged sword because it wraps and abstracts things. You really have to embrace the Power BI catalog and its security model along with their limitations, because there is no way around it.
For example, you can’t use Azure Data Explorer to change the ACL permissions on the folder or create folders where OneLake doesn’t like them (OneLake simply ignores certain APIs). Isn’t this a good thing though to centralize at a higher level, such as the Power BI workspace and prevent users to pain outside the canvas? Well, how about granting some external vendor permissions to write to OneLake, such as to upload files. From what I can see, even Shared Access Signature is not allowed. Storage account key is not an option too as you don’t have access to the storage account. So, to get around such limitations, you’d have to create separate storage accounts, but this defeats the promise of one lake.
Perhaps, this is an example that you don’t care much about. How about the lakehouse medallion architecture pattern which is getting popular nowadays for lakehouse-centric implementations. While you can create subfolders in the Lakehouse “unmanaged” zone (the Files folder), no such luck with the Tables folder to organize your delta lake tables in Bronze, Silver, and Gold zones. I can’t see another solution but to create a Power BI workspace for each zone.
Further, what if you must address data residency requirements of some litigation-prone country? The above-cited presentation correctly states that content can be geo-located but that will require purchasing more Power BI capacities irrespective of the fact that ADLS storage accounts can be created in different geo locations. Somehow, I start longing about bringing your own data lake and I don’t think that provisioning storage accounts is that difficult (in fact, a Power BI workspace already has the feature to link to BYO ADLS storage to save the output of Power Query dataflows). Speaking of accounts, I’m looking forwards to seeing how Fabric would address DevOps, such as Development and Production environments. Currently, a best practice is to separate all services so more than likely Microsoft will enhance Power BI pipelines to handle all Fabric content.
The Ugly
How can I miss another opportunity to harp again on Power BI for the lack of hierarchical workspaces? The domain feature that Microsoft shows in the presentation to logically group workspaces could be avoided to a great extent if Finance could organize content in subfolders and break security inheritance when needed, instead of ending up with workspaces, such as Finance North America, Finance EMEA, etc. Since OneLake is married to the Power BI catalog, more catalog flexibility is essential.
Given that OneLake and ADLS in general would be a preferred location for business users to store reference data, such as Excel files, Microsoft should enhance Office to support ADLS. Currently, Office apps can’t directly open and save files in ADLS (the user must download the file somehow, edit it, and then upload it back to ADLS). Consequently, business users favor other tools, such as SharePoint Online, which leads to decentralizing the data required for analytics.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2023-06-03 07:36:332023-06-03 13:26:18Fabric OneLake: The Good, the Bad, and the Ugly
Please join us for the next meeting on Monday, June 5th, at 6:30 PM ET. Kristyna Hughes (Senior data & analytics consultant at 3Cloud will join us remotely and share best practices for dissecting a complicated and broken Power BI report and a checklist of how to fix it. Improving will sponsor the meeting. Your humble correspondent will demonstrate how Power BI can integrate better with SharePoint and OneDrive. For more details and sign up, visit our group page.
WE NOW MEET BOTH IN-PERSON AND ONLINE. WE STRONGLY ENCOURAGE YOU TO ATTEND THE EVENT IN PERSON FOR BEST EXPERIENCE AND BECAUSE AN EMPTY AUDIENCE IS DISCOURAGING TO SPEAKERS AND SPONSORS. ALTERNATIVELY, YOU CAN JOIN OUR MEETINGS ONLINE VIA MS TEAMS. WHEN POSSIBLE, WE WILL RECORD THE MEETINGS AND MAKE RECORDINGS AVAILABLE HERE.
PLEASE RSVP ONLY IF COMING TO OUR IN-PERSON MEETING AND PLAN TO EAT
Presentation: How to Fix an Inherited Power BI Report
Date: June 5th
Time: 18:30 – 20:30 ET
Place: Onsite and online
Level: Intermediate
Food: Food and drinks will be available for this meeting
Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)
Overview: A common request in the realm of reporting is “hey, we have this report that a previous VIP used, but the report is really slow and the new VIP would like to revamp the report to answer their questions”. Great, you think, I can knock this out no problem! Then, you open the report. Fifty tables, over a hundred measures, and six calculated tables later you start to panic.
This session will go into best practices for dissecting a complicated report and a checklist for quick wins. No need to panic, this session will help build a toolbelt to tackle any reconstruction project.
Speaker: Kristyna Hughes is a senior data & analytics consultant at 3Cloud. Her experience includes implementing and managing enterprise-level Power BI instance, training teams on reporting best practices, and building templates for scalable analytics. Passionate about participating and growing the data community, she enjoys co-writing on Data on Wheels (dataonwheels.com) and has recently co-founded Data on Rails (dataonrailsblog.com). She also is on the board of directors for PASS MN and is a co-organizer for Lexington Data Technology Group.
Sponsor: Improving
Sponsor: TBD
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2023-05-29 13:55:072023-05-29 13:55:07Atlanta Microsoft BI Group Meeting on June 5th (How to Fix an Inherited Power BI Report)
“May you live in interesting times” – Chinese proverb
Microsoft Fabric is upon us with a grand fanfare. You can get a good overview of its vision and capabilities by watching the Microsoft Fabric Launch Digital Event (Day 1) and Microsoft Fabric Launch Digital Event (Day 2) recordings. Consultants and experts are extolling its virtues and busy fully aligning with Microsoft. There is a lot of stuff going on in Fabric and I’m planning to cover the technologies I work with and care about in more detail in future posts as Microsoft reveals more what’s under the kimono. This post is about my overall impression on Fabric, in an attempt to cut through the dopamine and adrenaline-infused marketing hype. As always, please feel free to disagree and provide constructive criticism.
The Good
Let’s just say that after 30 years working with Microsoft technologies, I’m very, very skeptical when I hear loaded terms, like “revolutionary”, “one-something”, “never has been done before”, etc. We all witnessed impressive launches for products that wouldn’t last a year. But it looks like this time Microsoft got their act together and put something that may pass the test of time and that I could recommend or use to help clients. As a starter, I’m glad that we’ve finally settled on a common and open storage (delta and Parquet) after years of experimenting with proprietary and open formats (CDM folders anyone?). This common storage has several advantages, including accessibility, portability, and virtualization.
I also like very much that Microsoft doesn’t enforce or propel a specific architecture or data flow pattern. If you want a lakehouse, sure you can have it. Care about medallion file organization? Sure, you can do that. Don’t want a lakehouse but data warehouse if you don’t deal with files and you don’t like a notebook with a blinking cursor? Not a problem. Want to skip staging data as files to the lake and load it directly in the warehouse? Fine. This is very different approach than other vendors take, such as to promote data warehousing on top of lakehouses and/or rule out relational databases whatsoever (read my thoughts on this here).
It’s obvious that a gigantic effort has taken place to unify and in same cases rewrite products, such as Analysis Services and Synapse Data Warehouse, to adhere to this new platform and vision. Basically, Fabric is the focal point of decades of hard work from all Microsoft teams involved in analytics to at least make a complicate data estate easy to access and manage.
The Bad
Going back to the presentation and my skepticism, I wish Microsoft could dialed down on some promises, like “one copy” of data. Anyone who has implemented a data warehouse of a decent complexity knows that data duplication is necessary. Data exists in the source systems, needs to be staged, and then transformed. Right there we have three copies. True, virtualization might help us avoid some data movement scenarios, such as accessing data directly in S3 buckets or importing in a Power BI dataset (for most companies a few extra minutes for refreshing datasets is not an issue).
Speaking of companies, it’s clear that Fabric (and presenters in the videos) targets the needs of large organizations with complex integration scenarios. But for most organizations “Big Data” is a few million rows and most common integration task is analyzing data from one or multiple ERPs. Should they care about Fabric? I guess it would really depend on its value proposition and budgets, but Fabric pricing hasn’t been announced yet. If Fabric is not available in PPU (Premium Per User), it probably would be dead on arrival for smaller organizations, as they can get modern analytics by spending less than $200/month on infrastructure excluding Power BI per-user licenses.
Finally, although presenters highlighted avoiding vendor lock-in as one of the major benefits of Fabric, you’re going to put all your eggs in one basket: Power BI/Fabric. Making Power BI a one-stop destination for analytics makes of course a lot of sense to Microsoft and increases its revenue potential (nothing wrong with revenue if it brings value). But for you Fabric would be a long-term commitment and you better make sure you avoid Microsoft-proprietary features as much as you can, such as Power Query dataflows and Azure Factory dataflows, should one day you decide to divest from Fabric, Power BI, or even Azure. Otherwise, you might find yourself in a similar situation as this client who had to migrate hundreds of Alteryx flows.
The Ugly
Confusion has descended upon the BI land after Microsoft throws and abandons products left and right. In fact, the Fabric documentation has sections to help you choose product, such as Lakehouse, New Synapse data warehouse, Power BI datamart (that one is easy, stay away from it especially if you plan to adopt Fabric). Should we add Synapse Dedicated Pools and Azure SQL Database to the comparison table?
Further, rewriting these engines means that we must go back to square one and wait for features. For example, the new Synapse data warehouse lacks so many T-SQL features and outside my plans for any near-term projects. Just when I thought Synapse SQL dedicated pools were caching up on T-SQL parity, someone moved my cheese… Well, good things happen to those who wait, so let’s give Fabric a year or so.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2023-05-28 18:05:042023-07-31 07:46:30A First Look at Microsoft Fabric: The Good, the Bad, and the Ugly
Please join us for the next meeting on Monday, May 1st, at 6:30 PM ET. Akshata Revankar (Data Engineer, Specialist at McKinsey & Company) will show you how to leverage Power BI deployment pipelines to promote content changes between environments, such as DEV and PROD. For more details and sign up, visit our group page.
WE NOW MEET BOTH IN-PERSON AND ONLINE. WE STRONGLY ENCOURAGE YOU TO ATTEND THE EVENT IN PERSON FOR BEST EXPERIENCE AND BECAUSE AN EMPTY AUDIENCE IS DISCOURAGING TO SPEAKERS AND SPONSORS. ALTERNATIVELY, YOU CAN JOIN OUR MEETINGS ONLINE VIA MS TEAMS. WHEN POSSIBLE, WE WILL RECORD THE MEETINGS AND MAKE RECORDINGS AVAILABLE HERE.
PLEASE RSVP ONLY IF COMING TO OUR IN-PERSON MEETING AND PLAN TO EAT
Presentation: Getting started with Power BI Deployment Pipelines Date: May 1st Time: 18:30 – 20:30 ET Place: Onsite and online Level: Intermediate Food: Food and drinks will be available for this meeting
Agenda: 18:15-18:30 Registration and networking 18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing) 19:00-20:15 Main presentation 20:15-20:30 Q&A
ONSITE (RECOMMENDED) Improving Office 11675 Rainwater Dr. Suite #100 Alpharetta, GA 30009
Overview: If you are wondering, “Is CI/CD possible in Power BI?” , the answer is YES! and this can be achieved with “Deployment Pipelines”. With Deployment Pipelines, it’s now possible to move content smoothly from Development/Test to QA to Production, helping enterprise BI teams bring application lifecycle management to their power BI environments. · Understand deployment pipelines · Create a deployment pipeline · Deploy content with rules · Comparing Content · Move content from one stage to next.
Speaker: Akshata Revankar (Data Engineer, Specialist at McKinsey & Company) has 16+ Years of experience in data engineering and data reporting space. Have worked with Oracle database, SQL Server, SSIS, Informatica Power Center, Hadoop systems, Qlik and Power BI. Enjoy being in the data space and learning new things.
Sponsor: TBD
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2023-04-19 11:05:012023-04-19 13:15:37Atlanta MS BI and Power BI Group Meeting on May 1st (Getting started with Power BI Deployment Pipelines)
I’ve spent hours on this so someone else might find the solution useful. I’ve developed an SSIS package that uses a ForEach Loop container. Then, I closed Visual Studio and reopen it. The SSIS designer opens the package, thinks for a few seconds if it likes it or not, and then it crashes Visual Studio. I’ve noticed that the VS status bar shows a message that it validates the ForEach Loop container, which was an important clue.
How do we fix this horrible issue? Initially, I was thinking that it was interference from Visual Studio 2022 that someone else has recently installed. So, I upgraded, uninstalled, repaired, tried VS 2022, etc. to no avail.
Finally, I open the package code in text editor and added “DTS:DelayValidation=”True” to the container task to disable the upfront validation on package open.
This fixed the issue although I had no idea what caused the crash.
Please join us for the next meeting on Monday, April 3rd, at 6:30 PM ET. Your humble correspondent will revisit Power BI important fundamentals in a demo-packed session. For more details and sign up, visit our group page.
PLEASE NOTE THAT OUR IN-PERSON MEETING LOCATION HAS CHANGED! WE STRONGLY ENCOURAGE YOU TO ATTEND THE EVENT IN PERSON FOR BEST EXPERIENCE. ALTERNATIVELY, YOU CAN JOIN OUR MEETINGS ONLINE VIA MS TEAMS. WHEN POSSIBLE, WE WILL RECORD THE MEETINGS AND MAKE RECORDINGS AVAILABLE AT HTTPS://BIT.LY/ATLANTABIRECS. PLEASE RSVP ONLY IF COMING TO OUR IN-PERSON MEETING.
Presentation: Power BI Dashboard in an Hour (DIAH)
Date: April 3rd
Time: 18:30 – 20:30 ET
Place: Onsite and online
Level: Beginner
Food: Food and drinks will be available for this meeting
Agenda:
18:15-18:30 Registration and networking
18:30-19:00 Organizer and sponsor time (events, Power BI latest, sponsor marketing)
Targeting novice Power BI users, this hands-on, no-slide session covers important Power BI fundamentals and best practices. If you’re already a Power BI user, you’ll probably learn a new trick or two. And if you like a challenge, bring your laptop and try to keep up through the steps to create a Power BI dashboard!
Join us and learn how to:
Design your BI model
Acquire and transform data
Turning data into valuable and interactive insights
Sharing your visualizations with others
Speaker: Teo Lachev is a consultant, author, and mentor, with a focus on Microsoft BI. Through his Atlanta-based company Prologika he designs and implements innovative solutions that bring tremendous value to his clients. Teo has authored and co-authored several books, and he has been leading the Atlanta Microsoft Business Intelligence group since he founded it in 2010. Microsoft has recognized Teo’s contributions to the community by awarding him the prestigious Microsoft Most Valuable Professional (MVP) Data Platform status for 15 years. Microsoft selected Teo as one of only 30 FastTrack Solution Architects for Power BI worldwide.
Sponsor: Improving
Prototypes with Pizza: Power BI Latest with Teo Lachev
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2023-03-28 13:40:512023-03-28 13:41:45Atlanta MS BI and Power BI Group Meeting on April 3rd (Power BI Dashboard in an Hour)
Warning: This blog contains old tricks of an old dog.
Scenario: Suppose you have a large table in SQL Server, e.g. hundreds of millions or even a billion rows. DML operations (SELECT, INSERT, UPDATE, DELETE) take long time. How do you speed them up? Do you split the large table into multiple tables? Or, do you ask for better hardware? Or, do you start looking for a new job with less data?
Solution: It’s nothing new but I see clients struggle with this all the time because they don’t know any better.
The solution is to partition the table and use partition switching that SQL Server has supported since time immemorial.
Cathrine Wilhelmsen has a great step-by-step blog covering different scenarios, but the process goes like this:
Configure page compression for the large table (see benefits here).
Partition the large table, such as by month.
Create a not-partitioned staging table that has the same indexes and compression as the large table.
Find the corresponding partition in the large table that will require DML, such as by using this script.
If the data requires updates, switch out the affected partition to the staging table. Perform updates. For full loads where rows will be only inserted, you don’t have to switch out the partition (see the second scenario in Cathrine’s blog).
Switch in the staging table into the corresponding partition of the large table. This should take a few seconds.
As a bonus, the SQL Server query processor could eliminate partitions for SELECTs, thus improving the query performance.
https://prologika.com/wp-content/uploads/2016/01/logo.png00Prologika - Teo Lachevhttps://prologika.com/wp-content/uploads/2016/01/logo.pngPrologika - Teo Lachev2023-03-24 12:23:162023-03-24 12:23:16Working with Large Tables in SQL Server