So I spent roughly ten of my 13 years at Microsoft supporting SharePoint in one form or another so I can get long winded when talking about it. To address that, I am going to try a different approach for this blog. If you want the quick version, I put that first. My longer old man ramblings are at the end of the post if you are looking for more detail.
If you have noticed that seemingly small Excel files that you are loading from SharePoint into Power BI can sometimes take FOREVER to load when you refresh them, try this. Instead of using the SharePoint Folder data connector and filtering out the file you are looking for, use the web connector and connect directly to the Excel file using its real URL as the address, not the weird one you get when you copy a link using the SharePoint interface.
instead of the weirdness that you get from using Copy Link in Sharepoint.com sites.
So to summarize, don’t use this:
You are almost guaranteed to see faster load times.
I have a customer that loads a lot of small Excel files from SharePoint and I was noticing that they often took a long time to load or to even refresh the preview in the Power Query Editor in Power BI Desktop. Even more frustrating was the fact that Power BI Desktop didn’t seem to be doing much during the load. CPU was low, memory normal and barely anything was dribbling in the network connection.
I hadn’t really used the SharePoint folder data connector before so I decided to go look what it was doing in Power Query. Here is a typical setup:
In this example, we are connecting to the site ‘HanksWorld’ and getting back a list of all the documents in that site/site collection and their metadata. In the next line of M code, we are filtering out the file that we are looking for.
This rang alarm bells for me. For the longest time, SharePoint had all sorts of limitations about the number of files that could be contained in a document library or within a folder within a document library. The magic number of 2,000 was given to us by the product group to use as a ban hammer for support, as if 1,999 files was OK and 2,0000 wasn’t. The core issue was that when SharePoint had to iterate through that list of files to perform an operation, it really started to drag when you had a high number of files to iterate. I can’t remember the details, I think I have repressed most of my SharePoint memories but I know this, the more file names/metadata that SharePoint had to process within a simple operation (render a document library for instance ), the worse performance you could expect.
Look at it this way, if you have 10,000 files in your site collection or your site, every time PBI Desktop has to process this query, it asks for all of their names and metadata then filters it down to the proper file. This is a really wasteful way to do things and it will punish SharePoint. If you have multiple queries, all pointing to different Excel files in that same site, well now you are talking about multiple requests for all of that information, especially if you are loading data tables in parallel in PBI desktop. This can either really slam your SharePoint instance or even result in your requests being throttled.
When you change over to using the Web connector, it will access the file directly without iterating through all of the files in the directory. It will also figure out that your file is an Excel file and it will then allow you to select the sheet and move on.
Using the web connector in the Power Query editor will allow you to access your Excel file directly instead of having to iterate though a large list of files. Doing this will dramatically improve your refresh or refresh preview times in the Power Query Editor.