Runtime: GroupCollection should implement IReadOnlyDictionary interface to align with its Dictionary-Type Usage

Created on 15 Aug 2017  路  18Comments  路  Source: dotnet/runtime

GroupCollection acts like a dictionary, but doesn't provide all of the familiar methods found on Dictionary collections. This makes it harder to inspect contents to make informed decisions on its contents to drive logic. For instance, determining if a key was present in a named capture group before trying to perform logic with the result. This can be seen as a follow-on to issue dotnet/runtime#13933 (it was listed in the Open Questions section).

Rationale and Usage

This was something I stumbled upon when trying to make a RegEx expression in two forms, one with a lot of named parameters, but that were all optional. I wanted to check if the parameters existed before proceeding to perform logic with them but discovered that there was no method to check if a named capture group was actually captured. Currently, I can use an indexer to access a specific group name like

match.Groups["options"]

But unlike a normal dictionary, I can't call match.Groups.ContainsKey('options") to see if it exists first before retrieval if I have optional named capture groups. The current Contains method only checks on the object not the named key.

Currently, the group if not captured returns an empty string, but that is a bit misleading, as the group could have been captured but contained no data. So, there's an ambiguous state. If we had methods to check if the capture was performed, we could leave the group out and have KeyNotFoundExceptions or nulls to differentiate from existence vs. no data. This would let us then do:

if (match.Groups.ContainsKey("options")) {
    // do something with match.Groups["options"]
}

OR

Group options;
if (match.Groups.TryGetValue("options", out options)) {
    // do something with options group now.
}

Proposed API

public GroupCollection : IReadOnlyDictionary<string, Group>, ... {
    bool ContainsKey(string key);
    bool TryGetValue(string key, out Group group);
    ...
}

Details

  • As mentioned, this was an Open Question of dotnet/runtime#13933 that wasn't resolved. It should be a fairly straight-forward addition to the class.

Updates

  • 8/16/17 Updated request to align with review process format and changed to explicitly call out IReadOnlyDictionary interface.
Hackathon api-approved area-System.Text.RegularExpressions up-for-grabs

Most helpful comment

i'll work on this one during the hackaton

All 18 comments

@michael-hawker good suggestion, would you be willing to update to our standard format for review?

https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md
example
https://github.com/dotnet/corefx/issues/271

match.Groups.ContainsKey("options") and match.Groups["options"] != null are equivalent, so this would be API sugar.

The sugar I'd prefer would be bool TryGetValue(out Group group). Ever since out declarations, this has become one of my favorite patterns.

  • It doesn't result in a double lookup like if (match.Groups.ContainsKey("options")) { match.Groups["options"] } does.
  • You know what to expect with TryGetValue, but most dictionaries throw KeyNotFoundException rather than returning null from indexers so the intent is clearer when reading code using TryGetValue.

@danmosemsft will do.

@jnm2 I'll add that to my request as well though I think it'd be bool TryGetValue(string key, out Group group), eh?

I did notice that currently the non-existence groups are empty strings and not null (at least in UWP), so it's a little ambiguous currently as to the result (i.e. if the group captured but contained nothing vs. didn't exist at all).

@michael-hawker thanks for the suggestion! I will look into and will comment as soon as I fully reviewed it.

Potential hackathon candidate if we review the API addition in time.

cc @karelz

Video

Looks good.

  • We should also implement IDictionary as the BCL exposes the notion of read-onlyness as a mode (IsReadOnly returns true)
  • Given that we implemen IList<>, should we also implement IReadOnlyList<>?

Given that we implemen IList<>, should we also implement IReadOnlyList<>?

For clarity, he meant "Given that we implement IDictionary<>, should we also implement IReadOnlyDictionary<>?".

I also wanted to clarify:

I did notice that currently the non-existence groups are empty strings and not null (at least in UWP), so it's a little ambiguous currently as to the result (i.e. if the group captured but contained nothing vs. didn't exist at all).

Is this a separate bug and should be filed elsewhere or expected?

@michael-hawker that sounds like separate issue to me.

@karelz I didn't realize there was a Success property on the Group as well, so I think that's where I was misled, as I was only looking at the Value and expecting different results based on Match vs. No Match.

        Regex r = new Regex("\\<(?<Type>(?:Button)|(?:TextBlock))?\\>");

        var match = r.Match("<TextBlock>");

        Console.WriteLine("Match: " + match.Success);
        Console.WriteLine("Type: " + match.Groups["Type"]);
        Console.WriteLine(" TypeType: " + match.Groups["Type"].GetType());
        Console.WriteLine(" TypeMatch: " + match.Groups["Type"].Success);
        Console.WriteLine(" IsNull: " + (match.Groups["Type"] == null));
        Console.WriteLine(" IsEmptyString: " + (match.Groups["Type"]?.Value == string.Empty));

        Console.WriteLine();
        match = r.Match("<>");

        Console.WriteLine("Match: " + match.Success);
        Console.WriteLine("Type: " + match.Groups["Type"]);
        Console.WriteLine(" TypeType: " + match.Groups["Type"].GetType());
        Console.WriteLine(" TypeMatch: " + match.Groups["Type"].Success);
        Console.WriteLine(" IsNull: " + (match.Groups["Type"] == null));
        Console.WriteLine(" IsEmptyString: " + (match.Groups["Type"]?.Value == string.Empty));

        r = new Regex("\\<(?<Type>(?:Button)|(?:TextBlock)|\\s*)?\\>");

        Console.WriteLine();
        match = r.Match("<>");

        Console.WriteLine("Match: " + match.Success);
        Console.WriteLine("Type: " + match.Groups["Type"]);
        Console.WriteLine(" TypeType: " + match.Groups["Type"].GetType());
        Console.WriteLine(" TypeMatch: " + match.Groups["Type"].Success);
        Console.WriteLine(" IsNull: " + (match.Groups["Type"] == null));
        Console.WriteLine(" IsEmptyString: " + (match.Groups["Type"]?.Value == string.Empty));

Output:

Match: True
Type: TextBlock
 TypeType: System.Text.RegularExpressions.Group
 TypeMatch: True
 IsNull: False
 IsEmptyString: False

Match: True
Type:
 TypeType: System.Text.RegularExpressions.Group
 TypeMatch: False
 IsNull: False
 IsEmptyString: True

Match: True
Type:
 TypeType: System.Text.RegularExpressions.Group
 TypeMatch: True
 IsNull: False
 IsEmptyString: True

For example in the 2nd case above, I was expecting the Value to be null as the capture was unsuccessful rather than also having to check the Success property first. (vs. the third case where the capture could be successful but has no length and was an empty string.)

It feels like this proposal would want to factor the Success value into the result of TryGetValue too, no? Otherwise, it'd be the same 'trap'.

Thoughts?

I'm happy to open another issue to discuss this behavior, but it definitely seems related as if you use the TryGetValue paradigm and they only look at Value still they'd both return empty strings, but I wouldn't have expected for it to go into my if block in the 2nd scenario.

It feels like this proposal would want to factor the Success value into the result of TryGetValue too, no? Otherwise, it'd be the same 'trap'.

I don't think that TryGetValue should also consider the Success value. That's not the notion of the ContainsKey or TryGetValue pattern. I expect consumers first check if success and only then call TryGetValue.

i'll work on this one during the hackaton

For some reason I'm finding it unintuitive. If I see a TryGetValue method in intellisense, I'm going to think it's primarily about .Success, and I'd be wrong.

TryGetValue operates on collections that contain a key and a value, i.e. Hashtable and Dictionary. In our case the key is the group name and the value is the group object. If the key exists then the value should be returned without taking its internal state into consideration (in our case the Success property).

You will get the Group object back, same as you would do GroupCollection["group1"] today. You still want to check the Success property in both scenarios.

Oops, I was picturing TryGetValue on Group, lol! I'm sorry!

@peltco sent you the invite - see https://github.com/dotnet/corefx/wiki/Hackathon#how-can-i-grab-an-issue

Yeah, I think the crux of the issue is that groups will appear regardless of if they're captured (as long as they're defined in the RegEx string), so it's a bit of understanding the API.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

v0l picture v0l  路  3Comments

chunseoklee picture chunseoklee  路  3Comments

aggieben picture aggieben  路  3Comments

nalywa picture nalywa  路  3Comments

jkotas picture jkotas  路  3Comments