In dart if I create a multiline string like this:
String itemList = """
1. Item A
2. Item B
3. Item C
""";
I expect the output of print(itemList) to be as:
1. Item A
2. Item B
3. Item C
But the actual output comes out to be:
1. Item A
2. Item B
3. Item C
It would be helpful to introduce a method called itemList.trimLeadingIndents() that would give us the output as expected after removing the indents after every newline character.
I would like to contribute fixing this issue and providing a method but before that if the community can approve if they would like to have such method in the Dart language.
Any help would be appreciable.
I would much rather change the language to make the initial example work.
Adding a trimming function is "easy", at least if we can agree on what it should do. (treat CR, CR+LF and LF as line terminators or only LF, remove any common prefix consisting of only spaces, or only whitespace, from each "line", not treating a trailing line terminator as introducing an empty line, do remove a final line containing only spaces, maybe add a tabSize optional which allows expanding tabs to spaces). Definitely doable.
It's just a very specific function. If the only use-case for it is to fix literals, then I don't think it carries its own weight. At least not in the SDK, but it's fairly easily added with extension methods if you want it.
import "dart:convert";
String trimLeadingWhitespace(String text) {
var lines = LineSplitter.split(text);
String commonWhitespacePrefix(String a, String b) {
int i = 0;
for (; i < a.length && i < b.length; i++) {
int ca = a.codeUnitAt(i);
int cb = b.codeUnitAt(i);
if (ca != cb) break;
if (ca != 0x20 /* spc */ && ca != 0x09 /* tab */) break;
}
return a.substring(0, i);
}
var prefix = lines.reduce(commonWhitespacePrefix);
var prefixLength = prefix.length;
return lines.map((s) => s.substring(prefixLength)).join("\n");
}
main() {
var x = trimLeadingWhitespace("""
1.x
2.y
""");
print("$x");
}
The language change would be something like:
""" or ''' quote) contains only whitespace characters (syntactically, no escapes or interpolations),It does mean that tabs and spaces are different, Dart does not have a canonical way to convert between tabs and spaces, which is why I'd make it an error if the other lines do not match. That ensures that accidental mismatches are caught early.
@lrhn Can we just use a Regex pattern which can detect the leading indents after a newline and replaces the matches with ''?
If that's possible, I think it will be more efficient.
(I'm sure I can optimize the code to not split first, but do all the work on the original string, that will make that code more efficient as well).
If you just use one RegExp to detect the leading whitespace, then it cannot check that all the lines have the same leading whitespace.
If you do:
var something = """
* foo.
* bar
- baz
* qux
""";
you don't want to remove the extra indent from - baz, only the shared indent that is on all lines.
Let's try:
final RegExp _commonLeadingWhitespaceRE = RegExp(r"([ \t]+)(?![^]*^(?!\1))", multiLine: true);
String trimLeadingWhitespace(String text) {
var commonWhitespace = _commonLeadingWhitespaceRE.matchAsPrefix(text);
if (commonWhitespace != null) {
return text.replaceAll(RegExp("^${commonWhitespace[1]}", multiLine: true), "");
}
return text;
}
This can obviously be simplified if we assume that all line terminators are LF characters, then the final replace would just be:
return text.replaceAll("\n${commonWhitespace[0]}", "\n");
and we wouldn't have to allocate a new RegExp each time (just a new string).
It's been tested very little, but the logic seems correct :grin:.
It's still not particularly efficient because the RegExp checks each possible length of leading whitespace of the first line against all later line starts. The algorithm above knows to only use the common prefix in later checks.
Hmm, if we require the final line to be only whitespace, and all other lines starting with the same whitespace, then we can change the RegExp to:
final RegExp _commonLeadingWhitespaceRE = RegExp(
r"(?=[^]*^([ \t]+)$(?![^]))(?![^]*^(?!\1))", multiLine: true);
It would then start by finding the final line of only whitespace, and then check that all lines start with that whitespace. Might be more efficient, but less general. Still not massively efficient, though (I could probably implement that more efficiently in Dart code too).
Do you have a more efficient RegExp-based approach on mind?
(RegExps are not necessarily efficient just because they are compact - and hard to read).
So, just for completeness, I've written a benchmark using the second RegExp above and hand-written code to do the same thing: https://dartpad.dartlang.org/701db852e0a0c001786d82f04c87357c
(Bigger score is better).
The hand-written code is ~30% faster in dartpad, and 150% faster when run on the VM.
(For good measure, I also added a version using a single RegExp replace, but it's ~two orders of magnitude slower than the other approaches, and it's also a wrong implementation because it allows initial lines with different leading whitespace).
Most helpful comment
I would much rather change the language to make the initial example work.
Adding a trimming function is "easy", at least if we can agree on what it should do. (treat CR, CR+LF and LF as line terminators or only LF, remove any common prefix consisting of only spaces, or only whitespace, from each "line", not treating a trailing line terminator as introducing an empty line, do remove a final line containing only spaces, maybe add a
tabSizeoptional which allows expanding tabs to spaces). Definitely doable.It's just a very specific function. If the only use-case for it is to fix literals, then I don't think it carries its own weight. At least not in the SDK, but it's fairly easily added with extension methods if you want it.
The language change would be something like:
"""or'''quote) contains only whitespace characters (syntactically, no escapes or interpolations),It does mean that tabs and spaces are different, Dart does not have a canonical way to convert between tabs and spaces, which is why I'd make it an error if the other lines do not match. That ensures that accidental mismatches are caught early.