Read an Excerpt
Learn to Write DAX
A Practical Guide to Learning Power Pivot for Excel and Power BI
By Matt Allington Holy Macro! Books
Copyright © 2016 Tickling Keys, Inc.
All rights reserved.
ISBN: 978-1-61547-354-0
CHAPTER 1
Concept: Pivot Tables vs. Power Pivot
Some people wonder what the difference is between Power Pivot and pivot tables, so I'm going to start by explaining. Read on, and you'll have it sorted out in no time.
What Is a Pivot Table?
A pivot table is a summarisation and visualisation tool. The job of a pivot table is to connect to a data source and create on-the-fly totals and subtotals to help you and others make sense of data. The larger the set of data and the more granular the data, the more useful a pivot table becomes. Because pivot tables are embedded right inside Excel, with them you get all the other benefits of Excel as well.
Data Sources for Pivot Tables
Historically, there have been two main types of data sources that you can connect to with a pivot table: flat tables and data cubes.
Connecting to a Single Flat Table
To connect to a single flat table inside Excel, click in the table, select Insert, Pivot Table, and off you go. There are some limitations with this approach, however:
It is very common to have to do a lot of VLOOKUP()s (or similar operations) to be able to join data from different data sources into a single flat table.
Excel has a 1 million row limit. In fact, though, if you are using lots of VLOOKUP()s in a single flat table, you will reach performance limits well before you ever hit 1 million rows.
These two issues have historically prevented Excel from being a scalable BI tool. But Power Pivot changes that, as you'll see in a few moments.
Connecting to a Data Cube
A less common but very powerful use of pivot tables is to connect directly to a reporting cube such as a SQL Server Analysis Services multidimensional cube directly from Excel. Many large enterprises have multidimensional data cubes available for reporting. Allowing Excel users to connect directly to a cube and use a pivot table for reporting is super easy and convenient. But this is a relatively rare use case compared to the general use of Excel and the more common single-table use of pivot tables.
Enter Power Pivot
Power Pivot doesn't change anything about pivot tables, but it changes everything when it comes to the data that pivot tables connect to. Power Pivot adds a third (and, in my view, the best) method of connecting to source data.
Power Pivot is a data modelling tool that is used to structure and extend source data so that it can be analysed using Excel pivot tables (among other tools). Data modelling is not a term that is often familiar to Excel users as it is normally the domain of IT BI professionals. But this is no longer the case with Power Pivot for Excel.
What Is Data Modelling?
Data modelling is the process of taking data from various sources; loading, structuring, and relating data logically to other data; and enhancing, embellishing, and generally preparing the data for use.
The data modelling process includes:
Determining the optimal structure and shape of the source data to analyse (e.g., whether to bring in all the data, full data, or summary data).
Loading the data from the source into the data model (Power Pivot in this case).
Defining the logical relationships between the various tables (which is similar to what you do with VLOOKUP() inside Excel).
Defining data types (e.g., specifying whether a column of data is numeric or a column of currency values or a column of text fields).
Creating new insights from the source data so that you can analyse concepts that don't exist natively in the source data but yet can be calculated or created inside the data model. For example, if you have a table of transactional data with cost price and sell price, you can extend the data model to include a calculation for margin, margin percentage, etc., even though these concepts are not explicitly in the source data. Once you have modelled these new facts in the data model, they can be reused over and over by people using your workbook.
When you learn Power Pivot, you are actually learning data modelling. The term can be a little bit scary, but there is no reason to be concerned. By the time you have finished this book, you will be well on your way to being an accomplished data modeller using Power Pivot. Just use the techniques covered in this book and keep in mind that what you are actually doing is learning to be a data modeller using Power Pivot.
CHAPTER 2
Concept: Loading Data
Before you can start to write DAX and use Power Pivot, you need to load some data. Power Pivot always loads a complete copy of the source data into the Data Model as the first step in the process. Once it's loaded, you can share your workbooks with others, and there is no need for anyone else to have direct access to your source data.
When you load data, you have to decide which data to import, including which tables, which columns in each table, and also what "shape" the data should be when imported. In the following section, you will simply load data that has been prepared for you. But you need to be aware that the process of deciding which data to import is an important part of the data modelling process — and it has been done for you in this case.
Here's How: Loading Data from a New Source
You can download a copy of the sample AdventureWorks database used in this book from http://xbi.com.au/learndax. You should download the database now, unzip it, and place it in a location that is easy for you to find.
You are going to start off by loading the following tables from the AdventureWorks Access database:
Sales
Products
Territories
Calendar
Customers
Then you will prepare these tables for use in Power Pivot.
Follow these steps to load data into a workbook for use in Power Pivot:
1. Open a new blank Excel workbook. You should see the PowerPivot tab at the top of the sheet.
2. If you don't see the PowerPivot tab, select File, Options, Add-Ins. Then scroll to the bottom of the window and select COM Add-ins from the Manage list. Then click Go.
3. In the COM Add-ins dialog that appears, check the Microsoft Office PowerPivot check box and then click OK.
4. If you haven't already done so, download the sample database from http://xbi.com.au/learndax. Unzip the zip file and place the contents somewhere you can easily find again later.
5. On the PowerPivot tab, click Manage.
Take a moment to look at the Windows taskbar, shown in the image below. Hover your mouse over Excel in the taskbar and notice that there are now two separate windows: the traditional Excel window (see #1 below) and the Power Pivot window (#2).
6. In the Power Pivot window, select Home, From Database (see #1 below), From Access (#2).
7. Browse to the location of the sample database you downloaded and unzipped in step 4 and then click Next.
8. Accept the default option in the Table Import Wizard dialog (as shown below) and then click Next.
9. Select the five views at the bottom of the list by placing a check mark in the box next to each one. (Note the different icons for queries/views and for tables.)
10. Click Finish, and the wizard imports your data.
When you close the Table Import Wizard, you see the five tables you have just imported in the Power Pivot window. There should be five new tabs, one for each of the tables you just imported. Each of the tables is a complete copy of the data you imported from the source files (an Access database in this example). You don't need the source files again until you are ready to refresh the data — typically when the data changes at some time in the future. This is one of the many great things about Power Pivot: You can simply refresh the data when the data changes, and your workbooks are updated with the new data.
11. Ensure that you are in Data view by selecting Data View on the ribbon (see #1 below).
12. In the Data view, double-click on the Territory tab and rename it Territories for consistency.
13. Switch to Diagram view by clicking the Diagram View button on the ribbon (see #2 above).
14. If you can't see all five tables on the screen, click on the Zoom to Fit button to reveal the hidden tables.
15. Position your tables so that any data table or tables (there is only one in this case) are at the bottom of the screen and the lookup tables are at the top.
Once you've completed the preceding steps, you need to join the data table(s) to the lookup table(s), as described shortly.
Relationships Between Tables
A customer table typically has a list of all customers that a business has on file. But some of these customers may have never purchased anything from the company. Some customers may have made only a single purchase, and some customers may have made many purchases. So for each entry in the Customers table, there is either none, one, or many records in the Sales table.
The Sales table can be joined logically to the Customers table by using the customer key (customer number or ID). When these tables are joined on the customer key, there will be a one-to-many (Customers-to-Sales) relationship between these two tables.
Here's How: Joining Tables in Power Pivot
To join a lookup table to a data table in Power Pivot, follow this process:
1. Select a column from the data table (the table down at the bottom of the Power Pivot screen, as shown below). To do this, click the OrderDate column in the Sales table and hold down the mouse button (see #1 below).
2. Drag the column up and hover over the matching key in the lookup table (in this case, the Date column in the Calendar table; see #2).
3. Release the mouse button to complete the join.
4. Complete the same process for the other three tables. See if you can work out on your own which are the correct columns to join before you look at the answers below:
Data Table Column Lookup Table
Sales ProductKey Products
Sales CustomerKey Customers
Sales TerritoryKey Territories
Because the relationships are always one-to-many, the joins are specifically single-directional. Always drag from the data table up to the lookup table, not the other way around. As you can see in the image below, the dot end of the relationship points to the data table, and the arrow points to the lookup table.
By putting the data table at the bottom, you get a visual clue that the tables at the top of the screen are lookup tables. (Get it? You have to "look up" to see the lookup tables.)
5. Save the workbook by clicking the Save icon.
When you click the Save icon, you save both the Excel workbook and the Power Pivot data model at the same time (just as happens when you click the Save button within Excel). The Excel workbook and the Power Pivot data model are always saved together inside the same Excel file (e.g., .xlsx, .xlsb, .xlsm, etc.). If you want to switch you back to the Excel window, click the Excel button.
Shaping the Data
It's time to pause for a minute to discuss the optimal shape of data for Power Pivot. When I say "shape" of data, I am taking about things like how many tables you import, how many columns are in each table, which columns are in each of the tables, etc.
Shaping data is a huge topic, and I don't have room here to discuss it fully. But I do want to give some foundational advice to get you started. One reason this advice is important is because the shape of data in transactional systems (or relational databases) is seldom the ideal shape for Power Pivot. When the IT department executes an enterprise BI project, one of the important first steps is to shape the data so it is optimal for reporting. This step is normally completely transparent to the end user (i.e., you), and hence the end user is shielded from the need to do this. But I am sharing this important information with you here and now because you need to understand data shaping if you want to have efficient and effective Power Pivot data models. Just copying what you have in your source data is unlikely to be optimal.
Choosing a Schema (or Layout)
The generally accepted approach to bringing data into Power Pivot is to bring in your data in as what's known as a star schema. This is a technical term that comes from the Kimball methodology (also known as dimensional modelling; Google it) and describes the logical way data should be structured for optimal reporting performance. The visual layout of the tables in the following image (which includes exactly the same data you just imported) shows why it is called a star schema.
In short, there are data tables (only one, Sales, in this example) surrounded by lookup tables (Customers, Products, Territory, and Calendar in this example), and together they visually make a star shape.
The Visual Layout of Tables in the Diagram View
When it comes to visually positioning your tables in the Power Pivot Diagram view, I teach Excel users to position the tables in such a way that the lookup tables are located at the top of the window and the data tables are located at the bottom of the window (as shown below).
If you compare the last two images, you will see that they both have exactly the same logical relationship (links) between the tables: They are both star schemas, but they have different visual layouts.
The visual layout in the second image, the one just above, is the one developed and recommended by Rob Collie, and we can call it the Collie layout methodology. The Collie layout methodology involves placing the lookup tables at the top of the window and the data tables at the bottom. The importance of this for Excel users learning Power Pivot will become evident later in the book. For now, just trust me and do follow the Collie layout methodology.
So in summary, there is no one correct way to shape your data, but the star schema is the best approach where possible. When you import your data in the shape of a star schema, you should use the Collie layout methodology (lookup tables on top, data tables down the bottom) to visually layout your data in the Power Pivot Diagram view.
Understanding the Two Types of Tables: Lookup Tables and Data Tables
In the professional BI world, lookup tables are referred to as dimension tables, and data tables are called fact tables. For Excel users, though, I suggest using the terminology lookup tables and data tables.
Lookup Tables
You should have one lookup table for each "object" that you need for reporting purposes. For example, in the data being used here, these objects are customers, products, territories, and time (i.e., calendar). A key feature of a lookup table is that it contains one and only one row for every individual item in the table and as many columns as needed to describe the object.
So there is only one row for each unique customer in the Customerstable. The Customers table has lots of columns describing each customer, such as customer number, customer name, customer address, etc., but there is only one row for each customer. Each row is unique based on the customer number, and no duplicates are allowed.
(Continues...)
Excerpted from Learn to Write DAX by Matt Allington. Copyright © 2016 Tickling Keys, Inc.. Excerpted by permission of Holy Macro! Books.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.