I have been working with a retail customer that uses a 445 date table. Matt Allington over at PowerPivotPro wrote a great blog post on what a 445 date table is and how to create one but creating one wasn’t my issue because the customer provided me with the one that they use (Whew).
Instead, I was doing some custom time intelligence with DAX since I couldn’t use the built in functions due to the whacky date table. Specifically, the customer wanted a lot of their metrics expressed in terms of last week, the last four weeks, the last 13 wks and total.
As a side note here, the interesting thing about DAX is that there are usually all sorts of ways to do the same thing. I am not giving a template for how to solve this problem with the following approach, I am just using what I did to show you how I found my first (hah, hah) performance problem and solved it.
I started by defining a simple measure that would give me the latest date for the data that I had in my data set.
CURRENT WEEK ID = MAX ( ‘Date'[DateID] )
Since my Date table only has data for weeks that I have data in my fact table, this gives me the max Date ID for my dataset. With this information in hand, I created some custom helper tables in my model. The example below is the one created for the previous thirteen weeks helper table.
VAR maxID = [CURRENT WEEK ID]
>= ( maxID – 12 )
&& ‘Date'[DateID] <= ( maxID )
I made a helper table for each of the custom time periods, 13wk, 4wk and 1wk. These tables are created dynamically when the data model is refreshed and they are joined back to the main data table as shown:
So the thinking was this, to make life easy for myself, any time that I had to present a measure over one of these time periods, I would just do the following:
My4WkMeasure = Calculate([MyMeasure], Rolling_4_Date)
Initial testing went great, it worked as expected and it was super easy to remember this approach.
Of course, I did mention that I when I am doing development, it is against a week of data right? Yeh, probably should have mentioned that. Once I pulled in all of the data, here is a sample of the blazing fast speed that my measure was cranking out:
That is the ‘Server Timings’ readout from Dax Studio. My measure, just one single measure using this approach, took 4.5 seconds to load. For such a simple measure (basically total sales), it actually would have taken almost 15 seconds if it didn’t spread the love out over multiple cores. Even better, its not a lot of data being materialized, the time is all CPU time so I am making the Query Engine thing WAY too hard for such a simple calculation.
When I started digging into the query, I quickly saw my mistake. I don’t quite understand the full details of it because I am not Marco Russo or Alberto Ferrari, but I could see enough from the queries shown in Dax Studio to understand what I did wrong. By specifying an entire table as a filter instead of a column, the Query Engine performed a massive join that pulled in my entire rollup table, the main date table and the other connected rollup tables following the 1:1 relationships. All of the columns in the new mashed up table were then used in such a massive WHERE clause that it actually overran the buffer in Dax Studio before it captured all of the text. Usually DAX studio will cut off the text and tell you how many more lines were remaining but I think I overwhelmed it with my nonsense. The resulting WHERE clause looked like it was filtering on every available value in all of the date columns in all of the joined date tables.
That is only a short piece of the never ending WHERE clause (yes it goes on and on my friend….). Wow, I am not doing something right.
To try and limit the JOIN, I changed my measure to the following, this time specifying a column:
My4WkMeasure = Calculate([MyMeasure], Rolling_4_Date[DateID])
That’s more like it! Total time 48ms. Again, not exactly sure why its so much faster but if you look at the queries, the where clause is much more simpler.
Ok, wrapping up, what did I learn here? Always specify a column in your filter expression when using calculate? Always test your measures against production data? Always keep Dax Studio handy?
I guess so, these are all good things to know. More importantly, I got to immediately put to work the minding bending stuff that I am learning in the SQLBI.com course Optimizing DAX. I am only half way through the course and already am using way more of Dax Studio than I knew even existed before I started. This is not a paid endorsement, btw, I ponied up out of my own pocket to get a bunch of the videos that these two mad scientists cooked up and they have been worth every penny.
Using DAX Studio, I was able to performance test a measure by itself, quantify how bad a job I did with it, see where the slowness was coming from (yes its the WHERE Clause that doesn’t end…), make a change and immediate quantify how fast the measure was once I fixed it.
I will try to blog more perf scenarios as I come across them but for now, if you don’t know DAX Studio, and you are responsible for DAX/Power BI/SSAS Tabular performance, you need to learn it now.
And in case I didn’t firmly stick Lambchop into your ear with this post, enjoy: