Excel files hosted in SharePoint load VERY slowly.

So I spent roughly ten of my 13 years at Microsoft supporting SharePoint in one form or another so I can get long winded when talking about it. To address that, I am going to try a different approach for this blog. If you want the quick version, I put that first. My longer old man ramblings are at the end of the post if you are looking for more detail.

Quick Version:

If you have noticed that seemingly small Excel files that you are loading from SharePoint into Power BI can sometimes take FOREVER to load when you refresh them, try this. Instead of using the SharePoint Folder data connector and filtering out the file you are looking for, use the web connector and connect directly to the Excel file using its real URL as the address, not the weird one you get when you copy a link using the SharePoint interface.

So use:

https://Globex.sharepoint.com/sites/Scorpio/HanksWorld/WOMD/Docs/NerfGuns.xlsx

instead of the weirdness that you get from using Copy Link in Sharepoint.com sites.

https://Globex.sharepoint.com/ :x/r/sites/Scorpio/HanksWorld/WOMD/Docs?<bunchaguidystuffhere>

So to summarize, don’t use this:

Use this:

You are almost guaranteed to see faster load times.

Longer Version:

I have a customer that loads a lot of small Excel files from SharePoint and I was noticing that they often took a long time to load or to even refresh the preview in the Power Query Editor in Power BI Desktop. Even more frustrating was the fact that Power BI Desktop didn’t seem to be doing much during the load. CPU was low, memory normal and barely anything was dribbling in the network connection.

I hadn’t really used the SharePoint folder data connector before so I decided to go look what it was doing in Power Query. Here is a typical setup:

In this example, we are connecting to the site ‘HanksWorld’ and getting back a list of all the documents in that site/site collection and their metadata. In the next line of M code, we are filtering out the file that we are looking for.

This rang alarm bells for me. For the longest time, SharePoint had all sorts of limitations about the number of files that could be contained in a document library or within a folder within a document library. The magic number of 2,000 was given to us by the product group to use as a ban hammer for support, as if 1,999 files was OK and 2,0000 wasn’t. The core issue was that when SharePoint had to iterate through that list of files to perform an operation, it really started to drag when you had a high number of files to iterate. I can’t remember the details, I think I have repressed most of my SharePoint memories but I know this, the more file names/metadata that SharePoint had to process within a simple operation (render a document library for instance ), the worse performance you could expect.

Look at it this way, if you have 10,000 files in your site collection or your site, every time PBI Desktop has to process this query, it asks for all of their names and metadata then filters it down to the proper file. This is a really wasteful way to do things and it will punish SharePoint. If you have multiple queries, all pointing to different Excel files in that same site, well now you are talking about multiple requests for all of that information, especially if you are loading data tables in parallel in PBI desktop. This can either really slam your SharePoint instance or even result in your requests being throttled.

When you change over to using the Web connector, it will access the file directly without iterating through all of the files in the directory. It will also figure out that your file is an Excel file and it will then allow you to select the sheet and move on.

Summary:

Using the web connector in the Power Query editor will allow you to access your Excel file directly instead of having to iterate though a large list of files. Doing this will dramatically improve your refresh or refresh preview times in the Power Query Editor.

Power BI, Fixed Width Text Data and Rogue Quotation Marks

I ran across this bug feature while processing some NOAA data that I have been playing with off and on for a few years. If your text files have a single quotation mark in them, i.e. , the Query Editor in Power BI will not handle it well.

Lets see what happens.

First, I loaded a text file into Power BI that has a list of country names in it. In this case, it is the country names that will appear in the weather station data that I am playing with. The file is a simple fixed-width file with 291 entries that looks like this:

If I import the entire file, I end up with 291 rows in the Query Editor that looks as expected.

Now, if I add a simple ” to the text file as shown below, save it and refresh the query editor, things go sideways quickly.

Here is what it looks like in the Query Editor:

This is not cutoff by the way, there is no row 5 that appears.

Once I load the file, it looks like this:

All of the remaining text data from the quotation mark on is stuffed into row four.

If I add two quotation marks, it works as expected:

But if I add three, all bets are off again.

Makes sense right? Three quotation marks means a proper set and then a third wheel mark.

As a final test, lets add two rogue quotation marks to our data like this:

That gives me these two:

Query Editor View
Once the data is loaded, it looks like this.

I don’t have access to source code anymore but it sure seems that in the code for the query editor, as it is processing text data, once it ‘sees’ a quotation mark, it ignores all carriage return/end of line statements until it ‘sees’ another closing quotation mark.

Now this may not seem like a big deal and in the examples I showed, its pretty easy to see that something is amiss, but in my data files this took a while to find. I was processing hundreds of thousands of rows of fixed width text data so I didn’t see the behavior in the query editor. Once the data was loaded, the stray ” that I had in my data occurred deep in the set of data and since it doesn’t throw an error, this really screwed up how the data was being loaded after the single quotation mark was parsed.

By the way, I tested this in Excel and I saw the same behavior. No surprise I guess.

So is this a surprise to anyone? Is this something that everyone that has ever imported text data into Excel or Power BI already knew about? I get it that it’s an edge scenario that you find a single quotation mark in English fixed width text but this showed up in raw data files from a US government agency so it does happen.

Tell me in the comments – should I have already known this?

Stupid Stuff My Recruiter Says

Immediately after I wrote my previous post on recruiters and Power BI, I got this gem. If this keeps up, I am going to have to keep this as a regular feature on this blog.

Position : Power BI Developer

Roles & Responsibilities

  • Designing data integration from multiple systems into analytical data models.
  • Writing relational database SQL queries
  • Devising and promoting creative data visualizations in Tableau
  • Developing tools and dashboards using MS SQL Server and BI Visualization tools such as Tableau to provide real-time insights to business data
  • Debugging, monitoring and troubleshooting existing BI solutions and visualization

Maybe I am being too picky, but why would you want a Power BI developer to create Tableau reports?

Recruiters and Power BI

I am not sure why but I am suddenly getting a lot more unsolicited calls and emails from recruiters.  A good chunk of them are straight spam.  Anyone who would send me job postings for SharePoint (Its been four years since I supported it) or C# developer (12+ years since I did it for a living) obviously hasn’t spent more than 5 seconds with my actual resume.

I am always intrigued to see the Power BI ones that come through because for the most part, they still seem to be stock Microsoft Business Intelligence job descriptions from five years ago with the words ‘Power BI’ hastily scribbled at the bottom.

Power BI is a significantly different animal than the standard MS BI solutions that have been out there for decades now and it takes a special skill set to create really effective and powerful BI solutions.  The skill set and knowledge base are unique to this product so I thought I would take a stab at the top 5 skills that a  generic Power BI developer needs to know to be effective.

  1.   DAX
  2.   PowerQuery
  3.   Tabular Model Design
  4.   TSQL
  5.   Azure Technologies

Any thoughts on what skills are missing here?