Powershell: This line with type casting, regex, and parentheses cannot be run in pwsh (7.0.2, 7.1.0-preview.4)

Created on 26 Jun 2020  路  11Comments  路  Source: PowerShell/PowerShell

Steps to reproduce

If this line is pasted into a PowerShell 7 terminal, it will not run (at least on Windows).

PS> [datetime]$('6/26/2020 (today)' -replace ' \([^)]+\)$','')

You might think this is a bad regex (my first thought), but it does actually work:

PS> '6/26/2020 (today)' -replace ' \([^)]+\)$',''
6/26/2020

If I replace the ')' inside the regex, the line will run:

PS> [datetime]$('6/26/2020 (today)' -replace ' \([^.]+\)$','')

Or even removing the type casting will allow it to run:

PS> $('6/26/2020 (today)' -replace ' \([^)]+\)$','')

Expected behavior

I would expect it to cut out the ' (today' part of the string and cast the string as a [datetime].

PS> [datetime]$('6/26/2020 (today)' -replace ' \([^)]+\)$','')

Friday, June 26, 2020 12:00:00 AM

PS>

Actual behavior

Pasting in the command and hitting enter just activates a new line:

PS> [datetime]$('6/26/2020 (today)' -replace ' \([^)]+\)$','')
>>

PSReadline

The current version of PSReadline on my system is 2.0.0, updating to 2.1.0-beta2 made no difference. Even removing the module made no difference: Remove-Module PSReadline

Environment data

7.1.0-preview.4:

Name                           Value
----                           -----
PSVersion                      7.1.0-preview.4
PSEdition                      Core
GitCommitId                    7.1.0-preview.4
OS                             Microsoft Windows 10.0.18362
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0鈥
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

And also tested in 7.0.2:

Name                           Value
----                           -----
PSVersion                      7.0.2
PSEdition                      Core
GitCommitId                    7.0.2
OS                             Microsoft Windows 10.0.18362
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0鈥
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0
Issue-Bug WG-Engine

Most helpful comment

If it helps: the problem goes away if you insert a space between [datetime] and $.

@ThePoShWolf, as an aside (we're clearly dealing with a bug here): To use an expression or a _single_ command as part of a larger expression, (...) is sufficient (and also makes the problem go away, even without a space) - no need for $(...), whose use can have side effects - see this SO answer.

Thanks for the response! I don't think I tried it with the space, but I did workaround it by doing the regex replace and then piping it as a string and type casting elsewhere. It definitely isn't show-stopping for me. I just figured I'd report the bug and let someone with more knowledge of the PS internals decide what to do with that information.

All 11 comments

I also just tried writing that to a file and running the file and I actually get an error here:

PS> $str = @'
[datetime]$('6/26/2020 (today)' -replace ' \([^)]+\)$','')
'@
PS> $str | Out-File .\test.ps1
PS> .\test.ps1
ParserError: C:\Users\admahowell\test.ps1:1
Line |
   1 |  [datetime]$('6/26/2020 (today)' -replace ' \([^)]+\)$','')
     |                                                       ~~~~~
     | The string is missing the terminator: '.

Using the same edits as I shared on the original post, I am able to get the line to run.

Yeah that's a parse error alright. Not sure exactly how it's missing the string terminator though... Seeing the same on MacOS Catalina so it doesn't seem to be OS specific.

/cc @daxian-dbw @rjmholt this one... looks a bit troublesome, maybe one of you two would like to take a look?

Ha, already had a comment typed out here. I'll fire up the debugger and see why the tokeniser is doing this

If it helps: the problem goes away if you insert a space between [datetime] and $.

@ThePoShWolf, as an aside (we're clearly dealing with a bug here): To use an expression or a _single_ command as part of a larger expression, (...) is sufficient (and also makes the problem go away, even without a space) - no need for $(...), whose use can have side effects - see this SO answer.

This happens here:

https://github.com/PowerShell/PowerShell/blob/1de5e59e03911f57d11310d685fcc36759f4e8a3/src/System.Management.Automation/engine/parser/tokenizer.cs#L3453-L3459

For some reason, when $ is seen as part of a generic token, we try to reuse the expandable string logic. This ignores the opening string character and sees the closing paren inside that string.

Here's a minimal repro:

[x]$(')')

This is seen as:

[x]$(')  ')
-------  --
    |      \ incomplete string literal
    |
generic token

I'm somewhat conflicted on how to fix this. Personally I think we were bound to hit an issue like this in the current "generic token" lexing implementation, since the tokeniser is required to recognise a context-free grammar here. That's the parser's job, and the scanner should be generating smaller and more appropriate tokens for it.

So we could theoretically make the scanning logic here more complex to try and avoid this, but we'd effectively be duplicating logic from the parser.

Instead, the right solution would be to emit more tokens, representing a more digested input. In fact we can't even just emit two tokens, since just recognising an attribute requires a stack. Instead I would expect to see tokens like this:

  • LBracket
  • Identifier
  • RBracket
  • DollarParen
  • SingleQuotedString
  • RParen

However, I worry about what other states bring us to this point in the tokeniser and what such a change could possibly break. (Needing to worry about that is exactly why the parser should be the single point of responsibility of such things)

My usual stance on stuff like this is "let's try fixing it and see if that breaks anything" tbqh. 馃槀

It's incredibly difficult to reason about that kind of thing without just having it munch on some code and see how that goes. Other possible states we'd probably expect to see this are probably some weird function names (like my-fun$(ction')') / function test$(functio'n)') or something) I would guess.

If it helps: the problem goes away if you insert a space between [datetime] and $.

@ThePoShWolf, as an aside (we're clearly dealing with a bug here): To use an expression or a _single_ command as part of a larger expression, (...) is sufficient (and also makes the problem go away, even without a space) - no need for $(...), whose use can have side effects - see this SO answer.

Thanks for the response! I don't think I tried it with the space, but I did workaround it by doing the regex replace and then piping it as a string and type casting elsewhere. It definitely isn't show-stopping for me. I just figured I'd report the bug and let someone with more knowledge of the PS internals decide what to do with that information.

Yes, thanks for opening this issue @ThePoShWolf! Valuable for us to know about cases like this one

So while there's no documenting comment for ScanGenericToken(), it seems to be important for command mode parsing. So my guess is that it would likely affect bareword arguments with things like variables in them

@SeeminglyScience might be able to point out some of the lesser known cases, but yeah generic tokens are used for.... well, probably _way_ too much 馃槀

Was this page helpful?
0 / 5 - 0 ratings