Powershell: Export-Csv should have option to suppress quotes

Created on 13 Feb 2019  ·  14Comments  ·  Source: PowerShell/PowerShell

Export-Csv should have option to suppress quotes

By default Export-Csv puts quotes around all fields. This isn't always necessary and some programs even have problems reading the results. For example, Export-Csv will quote a numbers and dates:

…,"227","6/1/2018 12:00:00 AM"

Some programs then interpret the data as strings, which generally isn't the intent.

Also, quotes generally aren't required for strings (unless they contain the delimiter character).

Not using quotes (unless needed) would improve usability with other applications and reduce file output size.

Proposed technical implementation details

A couple options:

1) An option -UseQuotes which could be Never, Always, or (optionally) AsNeeded, StringsOnly.
2) An option to specify the quote character/string, then I could do -Quote "" to suppress quotes.
3) An option -QuoteFields to specify which fields to quote.

Any of these would solve my current problem (all of them would be a complete, powerful solution to most problems).

-UseQuotes options details:

  • Never - don't quote anything, ever.
  • Always - quote everything (current behavior).
  • AsNeeded - only quote fields that need it (they contain a delimiter character).
  • StringsOnly - quote all strings, but not numbers, dates, or other non-string values.

There's a good argument to be made that the default behavior should be AsNeeded or StringsOnly as they tend to be more compatible with other applications; however, it would be different and could break existing usage.

Area-Cmdlets-Utility Issue-Discussion Issue-Enhancement

Most helpful comment

My assumption is that StringOnly would be Always for just Strings. I can't think of a use case where I would want StringOnly+AsNeeded that wouldn't also be satisfied by AsNeeded.

All 14 comments

Imo, an AsNeeded should be the default, despite the fact that it would probably be a bit of a break in behaviour from what it was previously.

Yep. Now that I think about it, another option would be StringsOnly (so it would quote strings, but not numbers or dates). I'll add it to the proposal.

AsNeeded should be the default

It is a breaking change. It's unclear what area it is. I'd postpone the default changing to next "LTS" version.

Thanks for the fast action! FYI, a use case for StringsOnly is where numeric data needs to be treated as a string. Presumably the information in already stored in PowerShell as a string, so using StringOnly would force PowerShell to quote the string so the program reading the data knows it's a string and not numeric. (Unfortunately, Excel is dumb in this regard and treats quoted numerics as numerics, not strings; however, there are other programs that don't.)

An example would be ID#s which may have leading zeros (e.g. "0004738749837"), especially if the leading zeros are significant (e.g. the data contains both "001234" and "0000001234"). Such situations can arise when using variable length coding schemes.

Another case would be hex data. Some programs will look at 0087 and treat is as numeric, even though further down in the same column we might have 09DA, which is clearly intended to be hex. Forcing the data to be quoted would (ideally) treat the entire column as strings, not just those that have letters in them.

Consider the case of encoding a binary value into a string. "0012" might represent two bytes coded as a string, while "12" (leading zeros stripped) would represent a single byte.

@DavidBerg-MSFT I think what @iSazonov is concerned about is that when translating some types of objects to a string form, you can end up with their representation containing commas. In fact, some systems of recording _numbers_ will result in commas depending on the culture settings. I believe several European cultures actually notate a decimal point with a comma instead of a period. This would break the resulting file, if you attempted to have it create the file without quoting numbers as strings.

Ultimately, trying to determine ahead of time which types of objects can and cannot stringify well and not break the format is liable to be a _very_ large waste of time; the file created _is_ going to be a text file, by design of the CSV format. It is arguably much more effective to simply assume data is string and allow the program that intends to consume it to determine the format however is appropriate for its requirements.

@vexx32. Yes, in many countries, a number that we would write in the US as 1,234.56 would be represented as 1.234,56. In the US, default numeric formatting usually drops the comma (e.g. 1234.56); however, in the other countries it's the period that would be dropped (e.g. 1234,56), so we still have to deal with the comma.

In those cases (numbers that contain a comma) it would be necessary to either:

  1. Use Always (current default),
  2. Use AsNeeded, or
  3. Use a different delimiter (e.g. a tab or semi-colon). Then we can use Never or StringOnly (which would otherwise break).

That's a strong argument for making Always or AsNeeded the default, but I don't think it eliminates the value of Never and StringOnly (especially since we can change the delimiter, I particularly like tab as it almost never occurs in string values).

I've run into the same problem at times importing data where unquoted text contains commas. In those cases, if I can't quote the text, I usually try replacing the delimiter with a non-conflicting one.

Yes, ideally the target program will handle the data correctly; however, the more/better hints we can give it, the easier it is to work with the data.

I think I can agree with all of that! 😊

We should document those caveats with Never and/or StringOnly options to avoid confusion, though. I'm sure some users will still run into it, but as long as it's clearly documented I don't see an issue with it. It's essentially an opt-in behaviour to relax aggressiveness of quoting values based on criteria or just not use quotes at all, so I think it makes sense to allow it and document the potential pitfalls with suggestions of how to avoid them. 😄

@DavidBerg-MSFT Should StringOnly include AsNeeded behavior? Or maybe we need still StringOnlyAndAsNeeded option?

My assumption is that StringOnly would be Always for just Strings. I can't think of a use case where I would want StringOnly+AsNeeded that wouldn't also be satisfied by AsNeeded.

@DavidBerg-MSFT I am trying to add StringOnly and catch a problem. I can discover that original type of value is System.String. It is simple and it works. But there is Path type. Example: gps pwsh | ConvertTo-Csv -UseQuotes StringOnly. Path looks as string in output. I'd expect that Path should be quoted. This complicates the solution - it is not clear what types we should quote.

@isazonov since all values in a CSV are basically going to convert to string data in the file itself, maybe you could do a simple check on if (value.ToString().Contains(' ')) to determine if it needs quotes and is not itself a string value?

@vexx32 This is not necessary as long as there is no conflict with delimiter that already implemented as AsNeeded option. My question is that there are types that we usually see as strings. And I want to get confirmation that we ignore this and quote only values of string type.

Yes, when I was thinking of -StringOnly, I admit I was only thinking about fundamental data types (strings, numbers, dates, Booleans); however, any object which is not one of those fundamental data types is going to get converted to a string by calling it's ToString() method. Which brings up a question on how to treat non-fundamental types. I see a couple options:

1) Always treat non-fundamental types as non-strings. Easiest: we test for string type, anything else isn't quoted). And most literal (StringsOnly does exactly what it says). Fallback is to use AsNeeded.
2) Always treat non-fundamental types as strings (tougher, because then StringsOnly really becomes DontQuoteNumbersDatesAndBooleans…., requiring more testing, and becoming more confusing about what is or isn't considered a string).
3) Something like option 1, but also check for something like a ShouldOutputAsString field/property on objects... (too complex, I wouldn't do it just for this; however, I have built systems where data objects had metadata properties that provided hints for how to properly format them in different situations, it was quite nice).
4) Eliminate StringsOnly in favor of -QuoteFields , as that is a superset (allowing the caller to specify what they do/don't want quoted).
5) Eliminate StringsOnly as not being well enough thought out.

In terms of preference, I'd probably pick (in order) 4, 1, 5 (and not really consider 2 or 3). What do you think? @iSazonov?

Thanks for all the effort!

@DavidBerg-MSFT Thanks for feedback!

I'd prefer that we do "smart things". This means that users get "expected" results. Question is that is "smart" and "expected" in the context.
As @vexx32 pointed all is converted to strings before output to CSV. So user expectations can be based only on original types of values. We could implement exclude and include type lists with common defaults. In the case it is not clear benefits because excluding only fundamental types doesn't look smart enough. Also we'll have to add parameters to allow users modify the lists. Fallback to AsNeeded can confuse users too - it can be "over-smart". This leads us to believe that StringOnly parameter is not as useful as expected and QuoteFields is better alternative - does not require additional parameters and it is very fast.

Was this page helpful?
0 / 5 - 0 ratings