Excel files hosted in SharePoint load VERY slowly.

So I spent roughly ten of my 13 years at Microsoft supporting SharePoint in one form or another so I can get long winded when talking about it. To address that, I am going to try a different approach for this blog. If you want the quick version, I put that first. My longer old man ramblings are at the end of the post if you are looking for more detail.

Quick Version:

If you have noticed that seemingly small Excel files that you are loading from SharePoint into Power BI can sometimes take FOREVER to load when you refresh them, try this. Instead of using the SharePoint Folder data connector and filtering out the file you are looking for, use the web connector and connect directly to the Excel file using its real URL as the address, not the weird one you get when you copy a link using the SharePoint interface.

So use:


instead of the weirdness that you get from using Copy Link in Sharepoint.com sites.

https://Globex.sharepoint.com/ :x/r/sites/Scorpio/HanksWorld/WOMD/Docs?<bunchaguidystuffhere>

So to summarize, don’t use this:

Use this:

You are almost guaranteed to see faster load times.

Longer Version:

I have a customer that loads a lot of small Excel files from SharePoint and I was noticing that they often took a long time to load or to even refresh the preview in the Power Query Editor in Power BI Desktop. Even more frustrating was the fact that Power BI Desktop didn’t seem to be doing much during the load. CPU was low, memory normal and barely anything was dribbling in the network connection.

I hadn’t really used the SharePoint folder data connector before so I decided to go look what it was doing in Power Query. Here is a typical setup:

In this example, we are connecting to the site ‘HanksWorld’ and getting back a list of all the documents in that site/site collection and their metadata. In the next line of M code, we are filtering out the file that we are looking for.

This rang alarm bells for me. For the longest time, SharePoint had all sorts of limitations about the number of files that could be contained in a document library or within a folder within a document library. The magic number of 2,000 was given to us by the product group to use as a ban hammer for support, as if 1,999 files was OK and 2,0000 wasn’t. The core issue was that when SharePoint had to iterate through that list of files to perform an operation, it really started to drag when you had a high number of files to iterate. I can’t remember the details, I think I have repressed most of my SharePoint memories but I know this, the more file names/metadata that SharePoint had to process within a simple operation (render a document library for instance ), the worse performance you could expect.

Look at it this way, if you have 10,000 files in your site collection or your site, every time PBI Desktop has to process this query, it asks for all of their names and metadata then filters it down to the proper file. This is a really wasteful way to do things and it will punish SharePoint. If you have multiple queries, all pointing to different Excel files in that same site, well now you are talking about multiple requests for all of that information, especially if you are loading data tables in parallel in PBI desktop. This can either really slam your SharePoint instance or even result in your requests being throttled.

When you change over to using the Web connector, it will access the file directly without iterating through all of the files in the directory. It will also figure out that your file is an Excel file and it will then allow you to select the sheet and move on.


Using the web connector in the Power Query editor will allow you to access your Excel file directly instead of having to iterate though a large list of files. Doing this will dramatically improve your refresh or refresh preview times in the Power Query Editor.

Power BI, Fixed Width Text Data and Rogue Quotation Marks

I ran across this bug feature while processing some NOAA data that I have been playing with off and on for a few years. If your text files have a single quotation mark in them, i.e. , the Query Editor in Power BI will not handle it well.

Lets see what happens.

First, I loaded a text file into Power BI that has a list of country names in it. In this case, it is the country names that will appear in the weather station data that I am playing with. The file is a simple fixed-width file with 291 entries that looks like this:

If I import the entire file, I end up with 291 rows in the Query Editor that looks as expected.

Now, if I add a simple ” to the text file as shown below, save it and refresh the query editor, things go sideways quickly.

Here is what it looks like in the Query Editor:

This is not cutoff by the way, there is no row 5 that appears.

Once I load the file, it looks like this:

All of the remaining text data from the quotation mark on is stuffed into row four.

If I add two quotation marks, it works as expected:

But if I add three, all bets are off again.

Makes sense right? Three quotation marks means a proper set and then a third wheel mark.

As a final test, lets add two rogue quotation marks to our data like this:

That gives me these two:

Query Editor View
Once the data is loaded, it looks like this.

I don’t have access to source code anymore but it sure seems that in the code for the query editor, as it is processing text data, once it ‘sees’ a quotation mark, it ignores all carriage return/end of line statements until it ‘sees’ another closing quotation mark.

Now this may not seem like a big deal and in the examples I showed, its pretty easy to see that something is amiss, but in my data files this took a while to find. I was processing hundreds of thousands of rows of fixed width text data so I didn’t see the behavior in the query editor. Once the data was loaded, the stray ” that I had in my data occurred deep in the set of data and since it doesn’t throw an error, this really screwed up how the data was being loaded after the single quotation mark was parsed.

By the way, I tested this in Excel and I saw the same behavior. No surprise I guess.

So is this a surprise to anyone? Is this something that everyone that has ever imported text data into Excel or Power BI already knew about? I get it that it’s an edge scenario that you find a single quotation mark in English fixed width text but this showed up in raw data files from a US government agency so it does happen.

Tell me in the comments – should I have already known this?

Stupid Stuff My Recruiter Says

Immediately after I wrote my previous post on recruiters and Power BI, I got this gem. If this keeps up, I am going to have to keep this as a regular feature on this blog.

Position : Power BI Developer

Roles & Responsibilities

  • Designing data integration from multiple systems into analytical data models.
  • Writing relational database SQL queries
  • Devising and promoting creative data visualizations in Tableau
  • Developing tools and dashboards using MS SQL Server and BI Visualization tools such as Tableau to provide real-time insights to business data
  • Debugging, monitoring and troubleshooting existing BI solutions and visualization

Maybe I am being too picky, but why would you want a Power BI developer to create Tableau reports?

Recruiters and Power BI

I am not sure why but I am suddenly getting a lot more unsolicited calls and emails from recruiters.  A good chunk of them are straight spam.  Anyone who would send me job postings for SharePoint (Its been four years since I supported it) or C# developer (12+ years since I did it for a living) obviously hasn’t spent more than 5 seconds with my actual resume.

I am always intrigued to see the Power BI ones that come through because for the most part, they still seem to be stock Microsoft Business Intelligence job descriptions from five years ago with the words ‘Power BI’ hastily scribbled at the bottom.

Power BI is a significantly different animal than the standard MS BI solutions that have been out there for decades now and it takes a special skill set to create really effective and powerful BI solutions.  The skill set and knowledge base are unique to this product so I thought I would take a stab at the top 5 skills that a  generic Power BI developer needs to know to be effective.

  1.   DAX
  2.   PowerQuery
  3.   Tabular Model Design
  4.   TSQL
  5.   Azure Technologies

Any thoughts on what skills are missing here?

What is ASPerfCounters.JSON?

In a previous blog post, I have talked about the various diagnostic data that Power BI Desktop has available for troubleshooting issues.  One of the interesting files generated is ASPerfCounters.JSON.

I originally thought the name came from  ASP.NET performance counters but I am guessing that it actually has its roots with Analysis Services.

I was curious to see what data was contained in this file so I used Power BI Desktop to open and parse the file.  Power BI Desktop has a JSON file connector so it was relatively simple to crack the file open and see what was inside.  I was pretty surprised to see that the file actually had PERFMON data from the time that the file was generated to an hour earlier.  The data is collected every 60 secs for the following performance counters:

PagedMemorySize64inBytes & PeakPagedMemorySize64inBytes

VirtualMemorySize64inBytes & PeakVirtualMemorySize64inBytes

WorkingSet64InBytes and PeakWorkingSet64inBytes






To load the data into Power BI Desktop, I did the following.

  1. Use the process in the earlier blog post to generate a Frown Snapshot File.  Open the zip file generated and copy out the the ASPerfcounters.json file that has been generated.
  2. Use the JSON File connector to ‘Get Data’ and open the ASPerfCounters.json file.

3.   The Power Query Editor will open and it will show a list of records.  *Note – the following steps are how I imported and cleaned the file, no doubt there are other ways that it can be done as well.*

4.  First we need to convert the list to table.  If you right click the list column, you will see an option called ‘To Table’.

Select ‘To Table’ and you will get a confirmation dialogue asking for delimiters and how to handle extra columns.

I left everything as default as shown and selected ‘Ok’.

5.  At this point, your list has been converted to a table.  I selected the expand icon and selected ‘ok’ to the dialogue that appeared:

6.  Once expanded, Column1.p is yet another list that needs to be expanded.  When you select the expand icon, you get the option to expand to new rows.  Select that and will now see that Column1.p contains records that need to be expanded.  Expand Column1.p once more time and select both the key and value.  You will most likely want to ensure that ‘Use original column name as prefix’ is unchecked.

7.  At this point, you have all of your data expanded and accessible but there are some more things to be done to get at least a basic set of useful data.

First, delete columns ‘Column1.i’, ‘Column1.n’ and ‘column1.u’.  Nothing useful there.

Next you will want to pivot the key column.  Select the key column and choose ‘Pivot Column’ from the Transform section of the ribbon as shown:

When the dialogue pops up, select the default which should have the ‘Value’ column chosen as the ‘Values Column’.

This will display the data in a nice familiar format, date in the first column and columns for each performance counter.  Of course, the data looks pretty odd:

The date is displayed in what is apparently a native JSON date format.  I had to go hunting on the Power BI Forum to find out how to handle this format.  Luckily, Greg Deckler had this one in the bag.  The date is stored as milliseconds from 1/1/1970.  He lists a way to convert it using a calculated column but I wanted to convert it in power query so I used the following steps.

I selected the date column, then chose ‘Extract’ from the Transform section of the ribbon.  I chose ‘Text Between Delimiters’ and used the open and close parentheses as delimiters as shown.

Now I was dealing with a column that was in milliseconds so I set about parsing out the date and time.  I first renamed ‘Column1.t’ to ‘DateMill’, converted that column and all of the perf counter columns to actual numbers and then to get the current date, I added a custom column with the following code:

Date.AddDays(#date(1970,1,1), Number.IntegerDivide([DateMill], 86400000))

To get the time value in GMT, I created the following custom column:

= Table.AddColumn(#”Added Custom”, “Custom.1”, each Time.From(([DateMill]/86400000) – Number.IntegerDivide([DateMill],86400000)))

At this point, you can spend more time cleaning up the data and charting it as you wish:


So, why is this useful?  Well, its cool that you can use Power BI Desktop to parse JSON files.  This file might prove useful if you have a large Power BI model that takes hours to refresh.  In case of a crash, you can look at this file and see what was going on with your desktop performance.  You could also use these metrics to review performance during a refresh in case Perfmon is not your cup of tea.


Copy Session Diagnostics to Clipboard

Another quick hitter on Power BI Diagnostics.

If you have your model open in Power BI Desktop, you can get a quick dump of all of your Power Query code by doing the following:

  1.  From the top ribbon, Choose ‘Help, then ‘About’

2.  This will launch a popup window.  Choose ‘Copy diagnostics to clipboard’.

3.  Open Notepad (or Word, or OneNote, etc.) and paste.  All of your PowerQuery M code will be included in the information that is copied.

4.  Note also that when you do this, you generate a ‘Frown Snapshot’ file in your diagnostic folder.  By default, this snaphshot will have some ASPerfCounter data and ‘FlightRecorderCurrent.trc’, which is your Power BI ‘black box’ recorder.

Nifty eh?

Getting started with troubleshooting a slow Power BI report page

Performance always is an issue, isn’t it?  Throughout a career that wandered through IIS (My web pages are slow), SharePoint (My web parts/lists/search queries/indexes are slow), SQL Reporting (My SSRS reports/stored procedures/excel pages are slow) and now Power BI (My visuals are taking forever to load), troubleshooting slow performance is always a big part of what I do.

Troubleshooting Power BI visuals can be a little tricky.  There aren’t any obvious dials or gauges to look at, you can’t spin up perfmon and attach it to Power BI desktop and the logs, while impressive looking, won’t help you narrow in on the poorly written measure that is killing your performance.  What I am going to layout next is a quick approach that you can take to not only get a good look at the performance of a report page but how also you can narrow in on the measures that are dragging you down.

Quick Setup Note – I am using the customer profitability sample from Microsoft for my PBIX file.  Its visuals load super quickly but its a quick and easy download here.

First things first, your reports and data model need to be in the same PBIX.  We will be using Dax Studio to connect to the data model and run a trace so everything we are testing needs to be in the same PBIX.  If you have your visuals and data model in separate PBIX files, you will need to recreate your visuals in the PBIX where your data model live.


Create a blank report page.  Power BI desktop will load visuals on the report page that it opens when you open the PBIX so in order to capture a true idea of the page performance, you need to create a blank report page and save the PBIX with that page active.


Close and reopen your PBIX file.  If you did step one, you should be looking at a blank report page.


Open Dax Studio and the ‘Connect’ screen should open.  Select your open PBIX dpcument as shown and select ‘connect‘.


Once connected, click on the ‘All Queries’ button in the ribbon.  This actually starts a trace on your SSAS instance that is running in Power BI desktop.  Once the trace is ready, you will see ‘Query Trace Started’ in the output window.


Return to your PBIX that you have open in Power BI desktop.  Click on the report tab that you wish to trace and let the page fully load.  Once the page loads, you can stop the trace by returning to DAX Studio, choosing the ‘All Queries’ tab and selecting the stop button.


Once the trace is stopped, click on the duration column header to sort the queries by duration.  As  I mentioned earlier, this demo is super fast so the ‘slowest’ query took 21ms but hopefully you get the point.  You know have a list of queries that were performed to build your page, along with the time it took to execute each of the queries.


Continuing on, double click on the query text in the ‘Query’ column.  The actual code used will show up in the editor section above the output section.  Now you can analyze the DAX being called as well as run an individual trace to dig in deeper.


At this point, you can run all of the DAX in the editor or you can highlight and run just sections of it, just like normal in DAX Studio.  If you enable the Query Plan and Server timings options, you can capture a trace and see the actual queries that are being passed to the formula and storage engine for processing.  Enabling the query plan option does just what it says, it gives you the query plans, both physical and logical, that were chosen to run the queries.

I have a long animated GIF below that shows turning on the query plan and server timings options, setting the ‘Run’ option to flush the cache each time I run a query, then running the query.  I then show where you can find the query plan and server timings information.  Since I got the whole screen in the GIF, its a pretty lousy resolution but perhaps if you open it in another tab, you can see enough detail.

Questions, comments, suggestions on digging deeper into Power BI visuals/reports performance?  Throw me a comment or hit me up on Twitter – @szBigDan.

A little bit about Power BI Diagnostics

Gilbert Quevauvillie from FourMoo posted a great blog post identifying all of the different processes that are running when you have Power BI running.

He identifies four different processes that you will see running and a quick blurb on each.  I wanted to post a quick follow up to show how the diagnostic logging in Power BI captures output from these processes.

Power BI Desktop has diagnostic logs?  Absolutely!

To turn on the logging, enable it from your options as follows:

There is a link that you can click on that will take you directly to the traces folder in case you aren’t the type to memorize logging locations for software.

When you open this folder, \%username%\AppData\Local\Microsoft\Power BI Desktop\Traces, there are actually two sets of logs.  The first set is stored directly in this folder and they will only be created once you turn on tracing.

If you open the Performance folder that is in this directory, you will see that Power BI Desktop actually is ALWAYS logging.  I am fairly certain that these are a rolling set of logs that are captured in case of an unexpected failure.  When Power BI crashes, it will create a FrownSnapShot zip file and a PerformanceTraces.zip file.  The latter is a zip of the Performance folder.  The FrownSnapShot zip file is pretty cool because it contains the latest SQL Flight Recorder data that has been captured.  Interestingly enough, the FrownSnapshot zip also includes a file called ASPPerfCounters.json which looks like Perfmon data that is dumped into a JSON file.  Interesting……

The log files that you will find in the Traces folder are as follows.

  1.  PBIDesktop.<version>.TimeStamp
    • This file logs information for the Power BI Desktop Application.
  2.  Microsoft.Mashup.Container.NetFX45.<version>.<TimeStamp>
    • This log appears once you go into the Query Editor or refresh your data.  This is where Power Query actions are logged during the processing and cleaning of your data.
    • Once you start a data refresh in Power BI Desktop, you will see multiple log files that start with this name.  I am going to assume that Power BI Desktop will spin up multiple Power Query engines as necessary and each will have its own log file.
  3.  MSMDSRV.<version>.TimeStamp
    • From my experience, you will only see this log if an exception is captured by the SSAS engine in Power BI Desktop.  For instance, I saw this log appear when I was troubleshooting a crash that was occurring during data refresh.  This log is also included in the FrownSnapshot zip that is created during a crash.

If you have questions about Power BI Logging or want to add corrections, please leave me a comment!



Where did my arrows go?

Quick hitter on this one.  I pushed some reports to the Power BI Service lately and noticed that suddenly I didn’t have the ability to drill down into my matrix visualizations.  I could right click a row in the visualization and get the option to drill down but I am used to being able to hover over and see the following arrows pop up:

It’s a quick easy fix to get these back.  Once you have the March 2018 release of Power BI Desktop, you have the ability to turn off the visual header.  This means that while the report is in reading mode, the headers won’t appear when you mouse over the visualization.

Sure enough, when I went into the Reports Section of the workspace and checked the settings, I saw this:

Not sure how that got toggled on for my reports but since I had built a bunch of them from the same template, it’s easy to see how they all that setting applied.  Toggle that switch and its back to normal.

Simple fix.

Using Azure and Power BI to mine weather station data

I started to really get obsessed with data back when I was supporting SharePoint.   SharePoint used to (maybe still does?) have this huge diagnostics database that logged amazing amounts of data by default.  I am sure the product team had imagined writing all sorts of cool stuff around it but that feature must have been cut because the database appeared in SharePoint almost as an afterthought.  I spent many hours trying to use all sorts of tools to pry knowledge from that database but I never really got too far.

My other data obsession that started around the same time was digging through historical weather data.  Ever since seeing this episode of the Simpsons, I wondered what data was being collected by these mysterious weather stations.  Years later I finally found my answer when I stumbled across the National Centers for Environmental Information website, found here:  https://www.ncdc.noaa.gov/

After some poking around, I found an FTP site where the mother lode of weather information was stored and I have poked and prodded around with it ever since.  In part one of this series, I will cover how I automated pulling these files into an Azure Data Lake using Azure Data Factory.  As the series progresses, I will cover ETL/ELT and building out of a model for Power BI as well as building some visualizations.

Fits and Starts

In the past, I had used an SSIS package with a FTP task that downloaded all of the files to my desktop.  I used a For-Each container to spawn an instance of 7-zip to unzip each of the files and then used an Azure Blob task to upload the files into Azure.  This was an effective approach but it hardly seemed enterprise worthy so I started looking into using Azure Data Factory (ADF) to do the same thing.

The obvious approach with ADF would be to create a FTP data set and use a Copy Data activity to pull the files from the FTP server and move them to my desired location.  I hit an immediate brick wall however.  The NOAA FTP site wasn’t anonymous, but to log in, you use the user name ‘anonymous’ and pass your email address as the password.  The FTP data set in ADF will only allow you to choose anonymous or Basic authentication and the FTP server did not like the fact that the ADF FTP data set automatically encrypted the password.  Honestly, I got stuck here for a while.

What finally worked

Eventually I found that NOAA had the same data stored on a website that allowed anonymous access.  It was off to the races at this point, here is what I put together in ADF.

The website, https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/, looks like this:

Each file consists of a years worth of data in csv format that has been GZipped.  The URL of each file was just the URL listed above + the name of the file.

I wanted to be able to run a copy data action against each file so I brute force created a list of the filenames in Excel.  I then created my pipeline in ADF and stored the file names as an array in a pipeline parameter.  I first entered a few into the dialogue as shown below:

I then used the code button to enter them directly into the JSON file rather than trusting pasting in all of my names into that little box.

I then created my HTTP linked service that had the URL as my base URL as follows:

Next, I created my HTTP data set.

Note that I have the compression type set to GZip and that the relative URL is using a dataset parameter.  I will pass a value down to the copy activity during each iteration of the ‘For-each’ action that will contain the file name that will complete the URL needed to get the file.  I store that file name in the data set as shown in the Parameters section of the HTTP data set properties:

I created the target data set to point to Azure Data Lake, nothing out of the ordinary there.

Ok, finally I created the ‘For-each’ copy data activity in my pipeline.  I specified that each time it loops, it will iterate through the list of file names included as pipeline parameters by specifying the following in the settings of the ‘For-Each’ activity:

I then created my activity that I wanted my ‘For-each’ activity to run.  This would be the ‘Copy Data’ activity that would move the data from the web site to Azure.  This is pretty straightforward except when specifying the source.  If you remember, I created a parameter on my data set called ‘PipelineFileList’.  When you specify that source in the copy activity, because of that parameter, you get a chance to tell it what to store in that parameter as shown:

This is what finally ties it all together.  By specifying @item() as the value to pass down to the data set, each time the ‘For-each’ loop iterates, it takes the latest file name and passed it to the source data set, which in turn pulls it from the parameter and uses it as the file name when it makes the request to its HTTP source.

From here, it was just a matter of saving, publishing and letting it rip.

The speed involved was breathtaking.  Remember where I specified GZip as the compression method?  The pipeline ran for roughly 12 minutes, downloaded, decompressed and copied to Azure Data Lake over 94GB of data.

The next post in the series will be looking at how I start chewing through the data and getting it ready to be pulled into my data model.