In this article I have presented some of the frequent DAX queries I was encountering in my day to day work. The objective of this article is to help users with DAX through an example based approach. I found that it is easier to comprehend the nuances of the DAX language if it supported by simple examples. Note — it is not strictly neccessary that the results have to be obtained via DAX only. If the data model is good then Power BI
visuals can often meet the requirements. In this article I have covered the following DAX expressions:Overview
Sample data
The DAX expressions in this article are written around the MS Access sample database downloadable from Microsoft Learning. A copy of the same can also be downloaded from my Github repo here. A copy of the Power BI report which references this MS Access database can be downloaded from my Github repo here.
Database schema
Power BI modelData
To get a feel of what the data looks like I have presented the top 5 rows from each of the tables in this database
DAX studio primer
How to use DAX studio?
DAX studio from Microsoft is a very handy tool if you want to experiment with DAX queries outside of Power BI. I have listed some informative videos below. DAX Studio runs independently of Power BI, however it expects a running instance of Power BI to establish a connection.
- DAX studio tutorial: What should I use it for, tool overview
- Why you should use DAX Studio with Power BI
- Computing a measure in DAX Studio
How to execute Table expressions?
DAX studio expects any table expression to be encapsulated inside a EVALUATE() block. In the following example, we are inspecting the first 20 rows of the bi_salesFact table.
EVALUATE(
TOPN(20, bi_salesFact)
)
How to execute Scalar expressions?
To execute any expression that returns a scalar value (i.e. not a table) encapsulate the expression in a EVALUATE {} block
EVALUATE{
MAX(bi_salesFact[Date])
}EVALUATE
{
MIN(bi_salesFact[Date])
}
How to create a measure (MEASURE,SUM)?
In this example we are calculating the total sales per manufacturer. When using DAX studio, the DEFINE keyword should be used to create a new MEASURE and this declaration should precede the EVALUATE() keyword.
DEFINEMEASURE bi_manufacturer[TotalUnits]= SUM(bi_salesFact[Units])
EVALUATE
(
SELECTCOLUMNS
(
bi_manufacturer,
"id",bi_manufacturer[ManufacturerID],
"name",bi_manufacturer[Manufacturer],
"TotalUnits",bi_manufacturer[TotalUnits]
))
How to create a calculated column(UPPER,COLUMN)?
In the following example we are creating a new column which converts the manufacturer name to upper case
DEFINECOLUMN bi_manufacturer[ManufacturerUpper]= UPPER(bi_manufacturer[Manufacturer])
EVALUATE
(
bi_manufacturer
)
List of unique Product Segments (DISTINCT, ORDER BY)
In this example we are displaying an unique list of product segments.
EVALUATE(
DISTINCT( bi_product[Segment])
)
Use the ORDER BY tag if neccessary
EVALUATE(
DISTINCT( bi_product[Segment])
) ORDER BY bi_product[Segment] DESC
Distinct list of financial years from the Sales table (DISTINCT)
In this query we are creating a calculated column to get the year component from the sales transaction date and then using the DISTINCT on the year column
DEFINECOLUMN bi_salesFact[Year] = year(bi_salesFact[Date])EVALUATE
(
DISTINCT( bi_salesFact[Year] )
)
Distinct list of financial years from the Sales table(VALUES)
The VALUES expression has a similar behaviour to DISTINCT
DEFINECOLUMN bi_salesFact[Year] = year(bi_salesFact[Date])EVALUATE
(
VALUES( bi_salesFact[Year] )
)
Count of rows from all the tables (ROW,UNION)
This helps towards the answering the question — “How much data does my dataset hold?”
EVALUATE(
UNION
(
ROW("Table","bi_date","Rows",{COUNTROWS(bi_date)}),
ROW("Table","bi_geo","Rows",{COUNTROWS(bi_geo)}),
ROW("Table","bi_manufacturer","Rows",{COUNTROWS(bi_manufacturer)}),
ROW("Table","bi_product","Rows",{COUNTROWS(bi_product)}),
ROW("Table","bi_salesFact","Rows",{COUNTROWS(bi_salesFact)}),
ROW("Table","bi_sentiment","Rows",{COUNTROWS(bi_sentiment)})
)
)
In the following example we have added an ORDER BY clause
EVALUATE(
UNION
(
ROW("Table","bi_date","Rows",{COUNTROWS(bi_date)}),
ROW("Table","bi_geo","Rows",{COUNTROWS(bi_geo)}),
ROW("Table","bi_manufacturer","Rows",{COUNTROWS(bi_manufacturer)}),
ROW("Table","bi_product","Rows",{COUNTROWS(bi_product)}),
ROW("Table","bi_salesFact","Rows",{COUNTROWS(bi_salesFact)}),
ROW("Table","bi_sentiment","Rows",{COUNTROWS(bi_sentiment)})
)
) ORDER BY [Rows] DESC
Display N rows from a table (TOPN)
Use this when you want to do a quick visual inspection of a table.
EVALUATE(
TOPN (5,bi_salesFact)
)
The TOPN expression can also order the results
EVALUATE(
TOPN ( 5, bi_salesFact, bi_salesFact[Units], DESC )
)
Find rows with blank column values (COUNTBLANK, FILTER,COUNTROWS)
This answers the question. How many rows in the bi_geo table do not have a Region value?
EVALUATE{
COUNTBLANK(bi_geo[Region])
}
The same result can also be achieved by using COUNTROWS on a FILTER expression
EVALUATE{
COUNTROWS
(
FILTER(bi_geo, ISBLANK(bi_geo[Region]))
)
}
In the following example we are counting blank regions for a specific country
EVALUATE{
COUNTROWS
(
FILTER(bi_geo, ISBLANK(bi_geo[Region]) && bi_geo[Country]="France")
)
}
In the following example we are displaying all rows where Region is non-blank
EVALUATE(
FILTER (
bi_geo,
ISBLANK ( bi_geo[Region] )=FALSE
)
)
Add a calculated column to return 1 if region is blank otherwise 0 (ISBLANK, IF)
In this example we are creating a new calculated column on the table bi_region and using the IF expression to return either 1 or 0
EVALUATE{
COUNTROWS
(
FILTER(bi_geo, ISBLANK(bi_geo[Region]))
)
}DEFINE
COLUMN bi_geo[IsBlank] = IF( ISBLANK(bi_geo[Region]) ,1,0)
EVALUATE
(
bi_geo
)
What is the distribution of values in the Country column of the bi_geo table? (SUMMARIZE)
In this example we want to know the distinct list of countries and the total number of rows per country
EVALUATE(
SUMMARIZE(
bi_geo,
bi_geo[Country],
"RowCount",COUNT(bi_geo[Country])
)
) order by [RowCount] desc
What is the distribution of values in the Region columm of the bi_geo table? (SUMMARIZE,SUMMARIZECOLUMNS,GROUPBY)
This verifies that total rows(99618)=total non blanks(18929+14512+6507) + total blanks(59670). Note the presence of the blank row and the value of the Count is blank too. This is because by default the SUMMARIZE ,SUMMARIZECOLUMNS and GROUPBY functions ignore blanks.
Example using SUMMARIZE
EVALUATE(
SUMMARIZE(bi_geo,bi_geo[Region], "Count", COUNT(bi_geo[Region]) )
) ORDER BY [Count] DESC
Example using SUMMARIZECOLUMNS
EVALUATE(
SUMMARIZECOLUMNS(bi_geo[Region], "Count", COUNT(bi_geo[Region]) )
)
Example using GROUPBY
(
GROUPBY(
bi_geo,bi_geo[Region],
"Count", COUNTX(CURRENTGROUP() ,bi_geo[Region])
)
)
What is the distribution of values in the Region columm of the bi_geo table taking into account the blank values? (GROUPBY, SELECTEDGROUP(),IF, ISBLANK)
Approach 1
In this approach we are using GROUPBY and using ISBLANK and IF to convert the blank values into a non-blank value. Take note that the specified replacement value in the IF only helps in GROUPBY counting correctly
EVALUATE(
GROUPBY
(
bi_geo, bi_geo[Region], "Count",
COUNTX
(
CURRENTGROUP(),
IF
(
ISBLANK([Region]),
"some non blank value",
[Region]
)
)
)
)
Approach 2
In this approach we are first creating a calculated table with a new column NewRegion where the blank value has been replaced by the string 'blank' and then using SUMMARIZECOLUMNS to do the grouping
DEFINETABLE allRegions=CALCULATETABLE(
SELECTCOLUMNS
(
bi_geo,
"NewRegion",
IF(ISBLANK(bi_geo[Region]),"blank", bi_geo[Region])
)
)EVALUATE
(
SUMMARIZECOLUMNS(allRegions[NewRegion],"Count",COUNT(allRegions[NewRegion]))
)
Approach 3
This is similar to the previous approach where we first created a calculated table using CALCULATETABLE and replaced the blank values with the string 'blank'. We are now using GROUPBY to do the grouping on the calculated table
DEFINETABLE allRegions=CALCULATETABLE(
SELECTCOLUMNS
(
bi_geo,
"NewRegion",
IF(ISBLANK(bi_geo[Region]),"blank", bi_geo[Region])
)
)EVALUATE
(
GROUPBY
(
allRegions,
allRegions[NewRegion],
"CountUsingGroupBy",
COUNTX(
CURRENTGROUP(),
allRegions[NewRegion])
)
)
Approach 4
We are expanding on the previous approach of using GROUPBY and CALCULATETABLE and grouping by Country and Region
DEFINETABLE allRegions =
CALCULATETABLE (
SELECTCOLUMNS (
bi_geo,
"Country", bi_geo[Country],
"NewRegion",
IF (
ISBLANK ( bi_geo[Region] ),
"blank",
bi_geo[Region]
)
)
)
EVALUATE
(
GROUPBY (
allRegions,
allRegions[Country],
allRegions[NewRegion],
"CountUsingGroupBy",
COUNTX (
CURRENTGROUP (),
allRegions[NewRegion]
)
)
)
Approach 5
We could simply use GROUPBY and IF, ISBLANK to replace blank values with some string. Attention! COUNTX will refuse to count rows with blank values and therefore the IF clause is very important
EVALUATE(
GROUPBY(
bi_geo,
bi_geo[Country],bi_geo[Region],
"Count",
COUNTX
(
CURRENTGROUP(),
IF(ISBLANK ( bi_geo[Region] ),"blank",bi_geo[Region])
)
)
)
Are there any duplicates in the ‘zip’ column of bi_Geo table? (SUMMARIZE,COUNT)
This will help us establish the cardinality of a foreign key relationship with the zip column. Looking at the results we can conclude that there are indeed duplicates and hence 1-many relationship between bi_Geo and bi_SalesFact is ruled out.
EVALUATE(
SUMMARIZE(bi_geo,bi_geo[Zip], "Count", COUNT(bi_geo[Zip]) )
) ORDER BY [Count] DESC
The above can also be achieved by using SUMMARIZECOLUMNS
EVALUATE(
SUMMARIZECOLUMNS
(
bi_geo[Zip],
"CountOfOccurences",COUNT(bi_geo[Zip])
)
) ORDER BY [CountOfOccurences] DESC
What is the highest number of times a single ‘zip’ code has been duplicated? (SUMMARIZECOLUMNS)
In this example MAXX and SUMMARIZECOLUMNS are used together to get the group with the highest count
EVALUATE{
MAXX
(
SUMMARIZECOLUMNS
(
bi_geo[Zip],
"CountOfOccurences",COUNT(bi_geo[Zip])
),
[CountOfOccurences]
)
}
How many values in the ‘zip’ column are not duplicated? (SUMMARIZECOLUMNS, FILTER, COUNT)
We will arrive at this result in 2 steps. We will first SUMMARIZE the row counts per group and then FILTER on this result to give us ony those rows where the row count is 1
Step 1: Use FILTER and SUMMARIZECOLUMNS to produce a flat table of all zip codes which are used only once
EVALUATE(
FILTER
(
SUMMARIZECOLUMNS
(
bi_geo[Zip],
"CountOfOccurences",COUNT(bi_geo[Zip])
),
[CountOfOccurences]=1
)
)
Step 2:Use COUNTROWS on the table produced in the previous step to get a scalar value
EVALUATE{
COUNTROWS
(
FILTER
(
SUMMARIZECOLUMNS
(
bi_geo[Zip],
"CountOfOccurences",COUNT(bi_geo[Zip])
),
[CountOfOccurences]=1
)
)
}
Total units sold and total revenue earned per Product Segment (SUMMARIZE, SUM, ROUND)
EVALUATE(
SUMMARIZE(
bi_salesFact, bi_product[Segment] ,
"Total Revenue",ROUND(SUM(bi_salesFact[Revenue]),2) ,
"Total units", SUM(bi_salesFact[Units]) ))
ORDER BY bi_product[Segment] DESC
Min,Max,Avg sales per Product Segment (GROUPBY, SUMX,MINX, MAXX, AVERAGEX)
In this example we are calculating the statistics of sales in bi_SalesFact table on a per segment basis
EVALUATE(
GROUPBY
(
bi_salesFact,
bi_product[Segment],
"Total units", (SUMX(CURRENTGROUP(),bi_salesFact[Units])),
"Max units", MAXX(CURRENTGROUP(),bi_salesFact[Units]),
"Average units", AVERAGEX(CURRENTGROUP(),bi_salesFact[Units]),
"Min units", MINX(CURRENTGROUP(),bi_salesFact[Units]),
"Total revenue", (SUMX(CURRENTGROUP(),bi_salesFact[Revenue])),
"Max revenue", MAXX(CURRENTGROUP(),bi_salesFact[Revenue]),
"Average revenue", AVERAGEX(CURRENTGROUP(),bi_salesFact[Revenue]),
"Min revenue", MINX(CURRENTGROUP(),bi_salesFact[Revenue])
) order by [segment] DESC
Total units sold and revenue earned per Manufacturer (SELECTCOLUMNS)
EVALUATE(
SELECTCOLUMNS
(
bi_manufacturer,
"Manufacturer name", bi_manufacturer[Manufacturer] ,
"SumUnits" , CALCULATE(SUM( bi_salesFact[Units])) ,
"SumRevenue",CALCULATE(SUM( bi_salesFact[Revenue]))
)
)
Total units sold and revenue earned per Manufacturer (SUMMARIZE)
We are using SUMMARIZE to produce the same result
EVALUATE(
SUMMARIZE(
bi_manufacturer, bi_manufacturer[Manufacturer],
"SumUnits" , CALCULATE(SUM( bi_salesFact[Units])),
"SumRevenue",CALCULATE(SUM( bi_salesFact[Revenue]))
)
) ORDER BY [SumUnits] DESC
Sort the manufacturers on Total units sold (SELECTCOLUMNS, ORDER BY)
EVALUATE(
SELECTCOLUMNS
(
bi_manufacturer,
"Manufacturer name", bi_manufacturer[Manufacturer] ,
"SumUnits" , CALCULATE(SUM( bi_salesFact[Units])) ,
"SumRevenue",CALCULATE(SUM( bi_salesFact[Revenue]))
)
) ORDER BY [SumUnits] DESC
The above can also be achieved by using SUMMARIZE
EVALUATE(
SUMMARIZE(
bi_manufacturer, bi_manufacturer[Manufacturer],
"SumUnits" , CALCULATE(SUM( bi_salesFact[Units])),
"SumRevenue",CALCULATE(SUM( bi_salesFact[Revenue]))
)
) ORDER BY [SumUnits] DESC
Total units sold and revenue earned per Manufacturer per Segment
DEFINETABLE manuf_segment_totalunits = GROUPBY( bi_salesFact, bi_product[Manufacturer], bi_product[Segment] , "Total units",SUMX( CURRENTGROUP(), bi_salesFact[Units] ) ,"Total revenue",SUMX( CURRENTGROUP(), bi_salesFact[Revenue] ) )
EVALUATE
(
manuf_segment_totalunits
) order by [Total units] DESC
Total units sold and revenue earned per Manufacturer per Segment (renamed columns)
In this example we demonstrate how to rename the columns
DEFINETABLE manuf_segment_totalunits = GROUPBY( bi_salesFact, bi_product[Manufacturer], bi_product[Segment] , "total_units",SUMX( CURRENTGROUP(), bi_salesFact[Units] ) ,"total_revenue",SUMX( CURRENTGROUP(), bi_salesFact[Revenue] ) ) EVALUATE
(
SELECTCOLUMNS(
manuf_segment_totalunits ,
"Manufacturer Name",[bi_product_Manufacturer],
"Product segment",[bi_product_Segment],
"Total units",[total_units],
"Total revenue",[total_revenue]
)
) ORDER BY [Manufacturer Name]
Best selling and worst selling Product segment for every Manufacturer (SELECTEDVALUE,SUMMARIZE, MAXX, SUM, MINX)
We will attempt to answer the question — “For every manufacturer what was the best performing and worst performing product segment with regards to units sold?” To achieve this we will create 4 measures
- segment_maxunits_name Calculates the name of the product segment for a manufacturer which sold the highest number of units
- segment_maxunits_value Calculates the total units sold by a manufacturer for the product segment calculated by the measure segment_maxunits_name
- segment_minunits_name Calculates the name of the product segment for a manufacturer which sold the least number of units
- segment_minunits_value Calculates the total units sold by a manufacturer for the product segment calculated by the measure segment_minunits_name
Step 1 Use the SUMMARIZE
expression on the bi_salesFact and group by Segment. Use the SELECTEDVALUE to filter the records going into SUMMARIZE so that we are dealing with sales related to the current manufacturer only.
Step 2 Create measures on the bi_manufacturer which will pick the maximum and minimum from the output of Step 1
Step 3 Create measures which use the maximum and minimum values from Step 2 to filter the results of the SUMMARIZE operation in
Step 1 and we are now left with the rows which have the segment name.
//Get the max units sold by a segment
VAR summary=
SUMMARIZE
(
FILTER ( bi_salesFact, RELATED(bi_product[ManufacturerID] ) = SELECTEDVALUE(bi_manufacturer[ManufacturerID])),
bi_product[Segment],
"TOTAL UNITS", CALCULATE(SUM(bi_salesFact[Units]))
)
VAR maxUnit=MAXX(summary,[TOTAL UNITS])
VAR x=SELECTEDVALUE(bi_product[Category])
RETURN maxUnit
//Now get the segment name which sold the max units
VAR summary=
SUMMARIZE
(
FILTER ( bi_salesFact, RELATED(bi_product[ManufacturerID] ) = SELECTEDVALUE(bi_manufacturer[ManufacturerID])),
bi_product[Segment],
"TOTAL UNITS", CALCULATE(SUM(bi_salesFact[Units]))
)
VAR maxUnitValue=MAXX(summary,[TOTAL UNITS])
var maxSegmentName = CALCULATE( MAXX(FILTER( summary, [TOTAL UNITS]=maxUnitValue),[Segment]))
RETURN maxSegmentName
MEASURE bi_manufacturer[segment_minunits_value] =//Get the min units sold by a segment
VAR summary=
SUMMARIZE
(
FILTER ( bi_salesFact, RELATED(bi_product[ManufacturerID] ) = SELECTEDVALUE(bi_manufacturer[ManufacturerID])),
bi_product[Segment],
"TOTAL UNITS", CALCULATE(SUM(bi_salesFact[Units]))
)
VAR minUnit=MINX(summary,[TOTAL UNITS])
RETURN minUnit
//Now get the segment name which sold the min units
VAR summary=
SUMMARIZE
(
FILTER ( bi_salesFact, RELATED(bi_product[ManufacturerID] ) = SELECTEDVALUE(bi_manufacturer[ManufacturerID])),
bi_product[Segment],
"TOTAL UNITS", CALCULATE(SUM(bi_salesFact[Units]))
)
VAR minUnit=MINX(summary,[TOTAL UNITS])
var minSegmentName = CALCULATE( MAXX(FILTER( summary, [TOTAL UNITS]=minUnit),[Segment]))
RETURN minSegmentName
EVALUATE(
SELECTCOLUMNS(
bi_manufacturer,
"id",[ManufacturerID],
"name",[Manufacturer],
"max_segment_name", [segment_maxunits_name],
"max_segment_units",[segment_maxunits_value],
"min_segment_name", [segment_minunits_name],
"min_segment_units",[segment_minunits_value]
)
)
Conclusion
I would be delighted to hear from you. Did you spot any mistakes? Did I miss anything obvious? Your feedback would be very beneficial for my future work. Thank you.