Rust: Tracking issue for str escaping

Created on 13 Aug 2015  Â·  28Comments  Â·  Source: rust-lang/rust

This is a tracking issue for the unstable str_escape feature in the standard library. The open questions I know of here are:

  • Do we want to support these relatively arbitrary ways to escape strings? Note that these same methods for escaping characters are already stable.
  • Are the precise forms of escaping stable? (probably because they delegate to char implementations)

I suspect that because the implementations on char are stable that we'll also stabilize these.

Signatures:

String::escape_debug(&self) -> String;
String::escape_default(&self) -> String;
String::escape_unicode(&self) -> String;
B-unstable C-tracking-issue T-libs disposition-merge finished-final-comment-period

Most helpful comment

For most iterator adapters (including Map) that doesn't even work because many impls are conditional on what traits the wrapped iterator implements -- DoubleEndedIterator, ExactSizeIterator, Clone, etc.

I also suspect there's problems with variance but that's probably less of a problem in practice.

All 28 comments

@alexcrichton Mind if I take this issue?

@AnthonyBroadCrawford certainly! The workflow for this would probably look like:

  • Come up with a story for these methods, e.g. should they be stabilized as-is or tweaked?
  • Gain feedback about that state of affairs (e.g. IRC, reddit, forums, etc).
  • If large enough, write an RFC
  • Write a PR
  • Stabilize in the next cycle.

The reason given for why they're currently unstable says that they might be rewritten to return a char iterator instead?

Maybe we can consider the idea of them returning iterators, but those iterators implementing Display. That makes it easy to use them composably (without having to allocate a string) while also having the string available (the .to_string() method).

@bluss You can already do .collect() on a char iterator to turn it into a string.

True. With Display you can {}-format the escaped string without intermediate String or write-each-char loop.

:bell: This issue is now entering its final comment period for stabilization :bell:

For now this FCP will be for returning a String as returning an Iterator which implements Display isn't found much elsewhere in the standard library, but FCP can help duke this out!

Given that we have char::escape_{default,unicode} already stabilized, these seem like natural things to stabilize as well. I also agree with returning String, because that's probably what you'll use most of the time. If the extra allocation really matters, then expecting the programmer to build the iterator themselves with the underlying methods on char seems pretty reasonable IMO.

I do admit that I find it strange that these methods are in std, but I have found them occasionally very useful.

As previous, I favour using Display-based interfaces for composability.

For example write!(w, "{}:{}", s.escape_default(), t.escape_default()) does then not need to allocate intermediate strings, and it can be part of a much larger function, where the output and allocation is ultimately determined by the caller.

@BurntSushi The problem with going through char iterators is extra utf-8 en/decoding.

@aturon Good point. If others feel good about returning an iterator, then I'm happy with that!

The libs team discussed this in triage recently and the decision was to punt on this for now.

It was brought up that with the exact same name as char::escape_default it is reasonable to expect that the two methods work the same way, so returning a String may be surprising vs returning an iterator with char::escape_default. It was not clear that we had consensus for returning a structure that implements Display + Iterator, but the options we were considering were:

// signature
fn escape_default(&self) -> EscapeDefault;
  1. EscapeDefault just implements Display
  2. EscapeDefault just implements Iterator
  3. EscapeDefault implements both Display + Iterator

As pointed out, just implementing Iterator may not be the most performant because you have to decode and then re-encode, whereas a direct Display implementation would lend itself to perhaps less work.

It's also a possibility that we could return one trait implementation and then add the other in a backwards-compatible fashion in the future.

While we decide this, however, this is being removed from FCP.

Just wanted to note that if it's decided that these methods are best implemented as Iterators or Displayable types, that to_lowercase and to_uppercase should be planned to eventually get the same treatment. Right now, they're implemented as Strings, which isn't the best IMO especially when in many cases it'd be more efficient to just use a Display implementation.

(obviously that'd be a breaking change, but it's something definitely worth considering)

@clarcharr to_lowercase and to_uppercase exist both as methods of &str where they’re convenience methods that return String, and of char where they return iterators that doesn’t allocate. The theory is that the former are easy to build on top of the latter (e.g. if you need something other than String) but there is one exception: str::to_lowercase has additional code to handle the case of Σ in "end of word" positions (which maps to ς instead of σ). So… maybe there should still be an iterator that does this.

(obviously that'd be a breaking change, but it's something definitely worth considering)

Probably we’d deprecate the old methods but not remove them.

This feature is about three inherent methods of the str type: escape_default, escape_unicode, escape_debug. Three corresponding inherent methods of char are already stable.

The libs team discussed this and the consensus was to stabilize, after changing the return type (from String) to dedicated types that implement both Iterator<Item=char> and Display, similar to char methods. If Display is used after Iterator::next has been called once or more, only the "remaining" chars are displayed. (Again like with char methods.)

@rfcbot fcp merge

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

  • [x] @Kimundi
  • [x] @SimonSapin
  • [x] @alexcrichton
  • [x] @aturon
  • [x] @dtolnay
  • [x] @sfackler
  • [x] @withoutboats

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

:bell: This is now entering its final comment period, as per the review above. :bell:

The final comment period is now complete.

As the FCP period is completed, now the rest is stabilization, isn't it? Should I follow these steps to tackle this?

Sorry if I am getting things wrong!
Thanks!

Correct, the next step is a stabilization PR. But those steps are for language features implemented in the compiler and mostly not relevant to library features. In the standard library some #[unstable] attributes need to be replaced by #[stable] (look for existing examples for what parameters to include).

Note however that in this case we agreed to stabilized after making some changes to the API: https://github.com/rust-lang/rust/issues/27791#issuecomment-376864727. (This can be two commits in the same PR.)

If you want to submit a PR feel free to ping me on it for review.

This is an interesting case here, as it is possibly the first time since the 1.26.0 release where we are stabilizing a function that returns an iterator. So we have the chance to return impl Trait here, but note that like @BurntSushi notes, impl Trait right now has disadvantages.

The agreed API is not just an iterator, but also something that implements Display. Is -> impl Iterator<Item=char> + Display supported?

Given the noted disadvantages I’m inclined to go with "traditional" dedicated types here. In particular, not being able to name the return type seems significant.

Is -> impl Iterator + Display supported?

Yes.

Note that @mominul had been reaching out for help on #rust-internals, this is what spawned my comment above. You may want to read the scrollback for context. Someone suggested use of impl Trait, then @rkruppe pointed out that this wasn't what the libs team agreed on.

Personally, I think that it would be best to make concrete types for now, like the rest of the standard library. At some point, though, it would be nice to be able to soft-deprecate all of the concrete iterator types and replace them with impl Iterator in the docs.

and replace them with impl Iterator in the docs.

For many of the adapters, that will actually need to be quite longer. For iter::Map for example, I think it would need to be, impl Iterator<Item=B> + DoubleEndedIterator + Debug + ExactSizeIterator + FusedIterator + TrustedLen + Clone + Send + Sync.

Err, you're right. It seems that a few trait aliases might come in handy for that.

Not sure about Send and Sync though.

For most iterator adapters (including Map) that doesn't even work because many impls are conditional on what traits the wrapped iterator implements -- DoubleEndedIterator, ExactSizeIterator, Clone, etc.

I also suspect there's problems with variance but that's probably less of a problem in practice.

Stabilized in #58051.

Was this page helpful?
0 / 5 - 0 ratings