Power BI Dataflows
Project Goal and Stakeholders
By the time I entered Great Deals, there were multiple dashboards that were already set-up in Power BI servicing the needs of different function teams. A concern that the team regularly encountered was the amount of time it took to refresh critical dashboards. For example, the dataset for our Commercial Dashboard, which was a significant resource during monthly campaigns took more than 30 minutes to refresh.
At the time, we were also migrating from the use Azure to Snowflake at the time. While using Azure, we paid a monthly subscription which allowed us to be more flexible in querying data from the database and feeding it into our dashboards. In Snowflake, the pricing scheme is based on limited amount credits which are used up every time we interact with the database. As such, there was a need to introduce initiatives that would allow for more efficient and scalable data management that would eventually come in the form of Dataflows.
Approach and Methodology
USE CASE ANALYSIS
Since I was new to the concept of Power BI Dataflows, I needed to conduct additional research to understand possible use cases and determine how feasible it was to set up. I determined that the advantage Dataflows could bring was that it allowed the team to be more strategic in how often certain data tables would be pulled from Azure/Snowflake into Power BI via refresh.
In our initial set-up, multiple tables are utilized across multiple dashboards. Some of these tables, such as calendar tables or market accounts tables are not updated as regularly as other tables. However, given our existing set-up, these tables would still be queried from the database every single time a dashboard utilizing it would refresh, contributing to a longer refresh time and costing a specific amount of credits.
By introducing Dataflows, tables that are shared across dashboards and not regularly updated would only need to be queried from the database based on the number of times the dataflow is set to refresh. Multiple dashboards can then query data from this dataflow as needed. Since the data is already in Power BI through the dataflow, it can increase refresh speed, and lessen the cost of pulling from Snowflake.
This use case, along with a proposal to go about how the transition can be implemented, was aligned with the rest of the team.
The main selling point of switching towards using Dataflows was that we could be more efficient with how we loaded data into our dashboards.
DASHBOARD AUDIT AND DATAFLOW GROUPING PROPOSAL
I began work on doing an extensive audit of all of the existing dashboards that was prioritized for the migration. This audit included identifying what tables were used in each dashboard, and which of these tables were used by more than 1 dashboard. I also consulted with my teammates in order to understand how their data models were set up, which included asking questions such as how often certain data tables needed to be refreshed, what the nature of the data was.
After testing out different combinations of tables for the Dataflow, I was able to create a recommended grouping based on the following criteria:
-
Similarities in Use Case (e.g. Target tables for campaigns would be grouped together
-
Refresh Frequency of the Table (A calendar table would only ever need to be refreshed once a year while
-
Performance of the Entire Dataflow (I needed to measure the amount of time a Dataflow would take to refresh)
A sample of the audit mapping I conducted for our Dashboards and Data tables. It was important to map out the use cases of each of the tables, understand how often they needed to be updated and in which dashboards they were being used.
DATAFLOW CREATION, DASHBOARD CREATION AND MONITORING
Once I had sign off on the Dataflow groupings, I set to work on creating the Dataflows in our Power BI Workspace. I also took some time to determine a process that would allow us to migrate tables in the Data Model while minimizing the downtime of our dashboards since they were still regularly being used by other teams.
By the end of the project, we had around 47 different tables that were grouped into 14 Dataflows.
I also involved the rest of my team by training dashboard point persons on how to execute the migration process since we needed to cover a lot of dashboards in a limited amount of time. At this point of the project, I took on a more supervisory role in ensuring the project was being completed within the agreed upon timeline, while my teammates supported in doing a lot of the migration legwork for the dashboards assigned to them.
A significant part of the transition process was using the Advanced Editor to ensure that we were able to switch from Azure Dataflows to Snowflake Dataflows with minimal downtime. We also needed to be intentional about updating affected DAX measures to ensure that the data will be updated properly.
Throughout the entire process, we put checks in place to ensure that there was little to no discrepancy in the values of our datasets. This was especially important given that we switched from importing directly from Azure to using Azure Dataflows to using Snowflake Dataflows during the project.
Project Outcome
As a result of this initiative, we were able to reduce the refresh time across all of our dashboards. Our merged data model dataset for example was reduced to 10 minutes when combining the refresh time for both the involved Dataflows and the Dataset. This was a reduction of more than 67% of the original refresh time.
As a result of this project, I also reduced frequency of table refreshes in Merged Data Model by 70% while still maintaining dashboard functionality and accuracy. The transition of dashboards from utilizing Azure Dataflows to Snowflake Dataflows was also resulted in a <1% discrepancy in Main KPIs and led to saleable solutions for additional dashboards that may need to be created by the team.