Docs: Description of Zero-width lookbehind/lookahead assertion is missing/wrong

Created on 9 Jan 2021  ·  7Comments  ·  Source: dotnet/docs

In the documentation, the current example provided for the Zero-width negative lookahead assertion gives the results of a Zero-width negative lookbehind assertion, which is wrong.

Moreover, there is no description on these within the Grouping Constructs section of the documentation, and from the examples provided in the documentation it doesn't get clear how _lookaround_ assertions behave in particular.

So, I propose to revise the examples of these four grouping elements by utilizing one uniform regular expression pattern and one uniform probe string for all the four groupings, so it gets clear what their particular features are.

This is my proposal:

Grouping construct Description Pattern Matches
`(?=`
_subexpression_
`)`
Zero-width positive lookahead assertion. `\b\w+\b(?=.+and.+)`
                                    
`"cats"`, `"dogs"`
in
`"cats, dogs and some mice."`
`(?!`
_subexpression_
`)`
Zero-width negative lookahead assertion. `\b\w+\b(?!.+and.+)`
 
`"and"`, `"some"`, `"mice"`
in
`"cats, dogs and some mice."`
`(?<=`
_subexpression_
`)`
Zero-width positive lookbehind assertion. `\b\w+\b(?<=.+and.+)`
 
`"some"`, `"mice"`
in
`"cats, dogs and some mice."`
`\b\w+\b(?<=.+and.*)`
 
`"and", "some"`, `"mice"`
in
`"cats, dogs and some mice."`
`(?_subexpression_
`)`
Zero-width negative lookbehind assertion. `\b\w+\b(?  `"cats"`, `"dogs"`, `"and"`
in
`"cats, dogs and some mice."`
`\b\w+\b(?  `"cats"`, `"dogs"`
in
`"cats, dogs and some mice."`

Please note:

  • To my proposal I only added the four rows in need of improvement. You need to add the other rows from the original table, too, if you'd decide to revise your table.

    If you want me to provide a full Grouping Constructs table, please drop me a note.

  • GitHub seems to wrap table columns, even when they contain pre-formatted text.

    So, in order to keep the regular expressions in my proposed content from being accidentally wrapped, I added a number of non-breaking spaces below the first regular expression. So, the regular expressions column won't be wrapped now. To the other table rows I then added a line-break alone, so the vertical spacing appears the same across all table rows.

  • In order to provide two row-spanning columns I had to use the HTML table format in favour of the Markdown format because Markdown doesn't support rowspan.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Area - .NET Core Guide Pri2 doc-enhancement dotnet-fundamentalprod

All 7 comments

Greetings! I'll forward this to some of our area experts: @pgovind @eerhardt can you comment on this suggestion?

I had been meaning to look at this today, but couldn't. I'll look at this on Monday. The examples themselves look correct. I haven't had a chance to see our current documentation here yet though!

Hi @pgovind,

Perhaps this may be of any help:

I used the following PowerShell script to test the _lookaround assertion_ results from my table proposal:

using namespace System.Text.RegularExpressions

$ErrorActionPreference = 'Stop'

$test = 'cats, dogs and some mice.'

Clear-Host

'\b\w+\b(?=.+and.+)',
'\b\w+\b(?!.+and.+)',
'\b\w+\b(?<=.+and.+)',
'\b\w+\b(?<=.+and.*)',
'\b\w+\b(?<!.+and.+)',
'\b\w+\b(?<!.+and.*)' | ForEach-Object {
    Write-Output ([Environment]::NewLine + ":: ${_} ::")
    [Regex]::Matches($test, $_, [RegexOptions]::CultureInvariant + [RegexOptions]::IgnoreCase + [RegexOptions]::Singleline).Groups |
        Select-Object -ExpandProperty Value
}

So the example at Zero-Width Negative Lookahead is:

      string pattern = @"\b(?!un)\w+\b";
      string input = "unite one unethical ethics use untie ultimate";

And matches are

//       one
//       ethics
//       use
//       ultimate

This looks correct to me. At each match, we're checking that the match does not start with un and is enclosed by word boundaries.

I do like using the same example with different lookaround patterns though. I think a table like the one you've posted is valuable when one wants to look at all the lookaround behaviors at a glance.

@adegeo : What do you think about having a section after Zero-Width Negative Lookbehind Assertions titled Grouping Constructs: At A Glance that lists the table that @SetTrend posted? Something like this (I gave up on fighting GH with the formatting):

**Grouping Constructs: At A Glance:**

Lookaround | Name | Function
(?=check) | Lookahead | Asserts that what immediately follows the current position in the string is "check"
(?<=check) | Lookbehind | Asserts that what immediately precedes the current position in the string is "check"
(?!check) | Negative Lookahead  | Asserts that what immediately follows the current position in the string is not "check"
(?<!check) | Negative Lookbehind | Asserts that what immediately precedes the current position in the string is not "check"

Table that @SetTrend posted here

cc @eerhardt

@pgovind: 👍👍👍

Yes, I can see it now. The original statements are correct. I must have made a mistake when checking the original Regular Expressions in the documentation for the first time.

However, to me your suggestion and amendment to my proposal seems very valuable.

Please let me add that in my above proposal I wanted to easily and figuratively convey the gist of lookaround assertions:

When the regular expression engine hits a _lookaround_ expression, it takes the substring from the current text position to either the start _(lookbehind)_ or end _(lookahead)_ of the string and runs Regex.IsMatch on that substring. - Success of this subexpression's IsMatch() result is then determined by whether it's a _positive_ or _negative_ assertion.

In my obvious "five words" example the above gist was supposed to be very easy to grasp.

Thank you @pgovind for taking a look at the issue.

@SetTrend this isn't a high priority area right now for us, so this will be on the backlog and I'll leave it open. However, if you feel up to the challenge, feel free to open a PR editing the article. Tag myself and @pgovind for review.

Cheers!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mekomlusa picture mekomlusa  ·  3Comments

stanuku picture stanuku  ·  3Comments

JagathPrasad picture JagathPrasad  ·  3Comments

sdmaclea picture sdmaclea  ·  3Comments

stjepan picture stjepan  ·  3Comments