I ran across this
bug feature while processing some NOAA data that I have been playing with off and on for a few years. If your text files have a single quotation mark in them, i.e. “, the Query Editor in Power BI will not handle it well.
Lets see what happens.
First, I loaded a text file into Power BI that has a list of country names in it. In this case, it is the country names that will appear in the weather station data that I am playing with. The file is a simple fixed-width file with 291 entries that looks like this:
If I import the entire file, I end up with 291 rows in the Query Editor that looks as expected.
Now, if I add a simple ” to the text file as shown below, save it and refresh the query editor, things go sideways quickly.
Here is what it looks like in the Query Editor:
Once I load the file, it looks like this:
If I add two quotation marks, it works as expected:
But if I add three, all bets are off again.
As a final test, lets add two rogue quotation marks to our data like this:
That gives me these two:
I don’t have access to source code anymore but it sure seems that in the code for the query editor, as it is processing text data, once it ‘sees’ a quotation mark, it ignores all carriage return/end of line statements until it ‘sees’ another closing quotation mark.
Now this may not seem like a big deal and in the examples I showed, its pretty easy to see that something is amiss, but in my data files this took a while to find. I was processing hundreds of thousands of rows of fixed width text data so I didn’t see the behavior in the query editor. Once the data was loaded, the stray ” that I had in my data occurred deep in the set of data and since it doesn’t throw an error, this really screwed up how the data was being loaded after the single quotation mark was parsed.
By the way, I tested this in Excel and I saw the same behavior. No surprise I guess.
So is this a surprise to anyone? Is this something that everyone that has ever imported text data into Excel or Power BI already knew about? I get it that it’s an edge scenario that you find a single quotation mark in English fixed width text but this showed up in raw data files from a US government agency so it does happen.
Tell me in the comments – should I have already known this?