Power BI dataflows and Azure Data Factory are two of the various paths Microsoft has created for data integration, data prep and data transformation for enterprises. We look at these two more popular cloud-based options to discover similarities and differences. To provide an easily digestible understanding of each, we’ve categorized the discussion by typical audience, the type of editor used, methods for preparing and transforming data for that tool, a synopsis of how the technology works, and pricing.
Power BI dataflows
Typical audience: Business users, “citizen data integrators.”
Summary: Self-service data prep tool for importing a wide variety of data into the Power Platform delivered as SaaS.
Editor: Visual, web-based Power Query editor that is embedded in the Power BI Service (app.powerbi.com) with step-by-step instructions for creating dataflows.
Prepare and transform data: Power Query online is also used to prep data and create data transformations. The underlying transformations generated by the web UI are in the M language.
Technology synopsis: Data imported by a dataflow is stored in an Azure Data Lake (Gen2) which has massive scalability. You can map data to Microsoft’s Common Data Model (pre-defined data schemas) or custom define schemas that align with your source data. Datasets can be generated and refreshed on a schedule from dataflows, then used to visualize data. Upcoming improvements include the ability to DirectQuery a dataflow without first importing the data into a dataset.
Limitations: Refreshes for non-premium workspaces occur on shared resources. Premium workspace dataflows are tied to the underlying Power BI Premium capacity, with certain concurrency limits based on capacity size.
Pricing: Dataflows are included with a Power BI Pro license, with certain features available only with Power BI Premium capacity.
Azure Data Factory
Typical audience: Corporate IT staff focused on data integrations; “data engineers.”
Summary: Cloud-based ETL PaaS utility for orchestrating data movement and transformations at scale to a variety of data storage destinations. You can do a lift-and-shift of SSIS packages here during cloud migrations and source control is supported.
Editor: Visual web-based editor (adf.azure.com) that is linked from the Azure resource. This editor is complex and deep, reflecting the many options available in Azure Data Factory.
Prepare and transform data: A wide variety of activities can be used in a Data Factory pipeline. The compute resources that can be leveraged include big data queries, machine learning processes, databrick activities (python, notebooks), custom .net code and even data wrangling and mapping using dataflows.
Technology synopsis: A series of interconnected systems that runs on top of the Azure platform in a serverless approach. The data output from a pipeline typically lands in Azure SQL Data Warehouse, another Big Data target, or other data storage resource. Visualization tools like Power BI can then query from those resources to visualize the data.
Pricing: Current pricing is consumption based, with activities costing a reasonable $1 per 1,000 runs. Special purpose SSIS nodes are costed by the hour.
Takeaways: Power BI dataflows and Azure data lakes
There are many similar use cases for Power BI dataflows and Azure Data Factory to collect, prepare and transform data for data warehousing, modeling and visualization. There are redundancies and interplay between the two tools, including:
- Power BI dataflows can consume data lakes or data warehouses populated by Azure Data Factory
- Azure Data Factory can consume Azure Data Lakes populated by Power BI dataflows
- Azure Data Factory can call dataflows as an activity of a pipeline
- Power BI reports can connect to Power BI dataflows; datasets generated by dataflows; Azure Data Lakes populated by either tool; or data warehouses populated by Data Factory
Power BI dataflows are targeted toward business users, unlocking powerful data prep capabilities for a wide audience of citizen data integrators. As a result, Power BI dataflows are more limited in scope and the editor is streamlined for this purpose.
Azure Data Factory is a deep and wide tool, with interplay into many other services. As a result, it offers more complexity and its options are focused toward a deeply technical user. It is especially useful for creating organizational level data consolidation and processing.
Perhaps most important, both cloud-based tools remove the need to maintain ETL servers and integration applications, allowing their respective users to focus on the data focused tasks at hand.
When considering the different approaches for sourcing and and preparing data in Power BI, it’s important to understand their impact on your reporting needs as well as scalability, security and performance. For more information, check out our on-demand webinars, including 4 Way to Model Data in Power BI and How to Share Power BI Datasets: Dataflows and Certified Datasets.
Senturus for data preparation
Get guidance choosing the best data prep and data modeling approaches for your organization. Good data management requires knowledge in the shape of real-world business acumen and superior technical expertise. Senturus delivers. With our experts on hand, you will get the right-sized, solid foundation for your business, enabling data driven success today and working with you as your needs and business grow. Contact us.