named regexp, named capture groups
Currently named capture groups are a bit of a pain in TypeScript:
.groups even when that is the only possibility.I propose making RegExp higher order on its named capture groups so that .groups is well typed.
// Would have type: RegExp<{ year: string, month: string }>
const date = /(?<year>[0-9]{4})-(?<month>[0-9]{2})/
const match = someString.match(date)
if (match) {
// match.groups type would be { year: string, month: string }
// currently is undefined | { [key: string]: string }
}
// Would have type RegExp<{ year: string, month?: string }>
const optionalMonth = /(?<year>[0-9]{4})(-(?<month>[0-9]{2}))?/
My suggestion meets these guidelines:
I could have sworn there was already another ticket discussing this exact issue at length (there were a lot of tradeoffs etc.), but I can鈥檛 find it now.
There may have been one for group arity?
It would be kinda neat with this to make types like RegExp</(?<year>[0-9]{4})-(?<month>[0-9]{2})/, { year: /[0-9]{4}/, month: /[0-9]{2}/ }>.
I propose making RegExp higher order on its named capture groups so that .groups is well typed.
This sounds like dependent typing to me, which is a rather large can of worms to open.
This sounds like dependent typing to me, which is a rather large can of worms to open.
I don't think so. I think it would be as simple as something like:
interface Match<G extends { [key: string]: string } | undefined> {
// Note that G is only undefined when the RegExp has no named capture groups
groups: G,
}
interface RegExp<G extends { [key: string]: string } | undefined = undefined> {
exec(s: string): null | Match<G>,
}
// and so on for String .match/.matchAll/etc
Note that all of the type information is already encoded in the regular expression literal /(?<foo>[0-9]+)/ implies that .groups is { foo: string } if it the match actually exists.
EDIT: Fixed code.
Might be nice to add the same stronger typing for numbered groups at the same time, so that
"foo".match(/(f)(oo)/)
Would only have valid indexers [0], [1], and [2]
There is an ESLint rules that enforces the use of named capture groups to avoid bugs & improve readability: https://eslint.org/docs/rules/prefer-named-capture-group
Paired with this feature it would be amazing
These helper methods kinda smell "any-ish" because of how their generic is used but they should be safer and easier to use than trying to directly read from .groups or having to write your own argument handling for String#replace. I'm not sure TypeScript itself will ever be able to have a proper type for String#replace because of how... extremely variadic it is; it's probably not impossible with slice types, but it'd require two generics on every RegExp to be able to know how many capturing groups there are in total (both named and unnamed).
/**
* Wrapper for functions to be given to `String.prototype.replace`, to make working
* with named captures easier and more type-safe.
*
* @template T the capturing groups expected from the regexp. `string` keys are named,
* `number` keys are ordered captures. Note that named captures occupy their place
* in the capture order.
* @param replacer The function to be wrapped. The first argument will have the
* shape of `T`, and its result will be forwarded to `String.prototype.replace`.
*/
export function named<T extends Partial<Record<string | number, string>> = {}>(
replacer: (
captures: { 0: string } & T,
index: number,
original: string
) => string
) {
const namedCapturesWrapper: (match: string, ...rest: any[]) => string = (
...args
) => {
const { length } = args
const named: string | Partial<Record<string, string>> = args[length - 1]
const captures: { 0: string } & T = Object.create(null)
if (typeof named === "string") {
// the regexp used does not use named captures at all
args.slice(0, -2).forEach((value, index) => {
Object.defineProperty(captures, index, {
configurable: true,
writable: true,
value
})
})
return replacer(captures, args[length - 2], named)
}
// the regexp has named captures; copy named own properties to captures,
// then copy the numeric matches.
Object.assign(captures, named)
args.slice(0, -3).forEach((value, index) => {
if (index in captures) {
throw new RangeError(
`Numeric name ${index} used as a regexp capture name`
)
}
Object.defineProperty(captures, index, {
configurable: true,
writable: true,
value
})
})
return replacer(captures, args[length - 3], args[length - 2])
}
return namedCapturesWrapper
}
// the first overload is here to preserve refinements if `null` was already
// checked for and excluded from the type of exec/match result.
/**
* Helper to extract the named capturing groups from the result of
* `RegExp.prototype.exec` or `String.prototype.match`.
*
* @template T type definition for the available capturing groups
* @param result the result of `RegExp.prototype.exec` or `String.prototype.match`
* @returns the contents of the `.groups` property but typed as `T`
* @throws if `.groups` is `undefined`; this only happens on regexps without captures
*/
export function groups<T extends Partial<Record<string, string>> = {}>(
result: RegExpMatchArray | RegExpExecArray
): T
/**
* Helper to extract the named capturing groups from the result of
* `RegExp.prototype.exec` or `String.prototype.match`.
*
* @template T type definition for the available capturing groups
* @param result the result of `RegExp.prototype.exec` or `String.prototype.match`
* @returns the contents of the `.groups` property but typed as `T`, or `null` if
* there was no match
* @throws if `.groups` is `undefined`; this only happens on regexps without captures
*/
export function groups<T extends Partial<Record<string, string>> = {}>(
result: RegExpMatchArray | RegExpExecArray | null
): T | null
/**
* Helper to extract the named capturing groups from the result of
* `RegExp.prototype.exec` or `String.prototype.match`.
*
* @template T type definition for the available capturing groups
* @param result the result of `RegExp.prototype.exec` or `String.prototype.match`
* @returns the contents of the `.groups` property but typed as `T`, or `null` if
* there was no match
* @throws if `.groups` is `undefined`; this only happens on regexps without captures
*/
export function groups<T extends Partial<Record<string, string>> = {}>(
result: RegExpMatchArray | RegExpExecArray | null
): T | null {
if (result === null) {
return null
}
if (result.groups === undefined) {
throw new RangeError(
"Attempted to read the named captures of a Regexp without named captures"
)
}
return result.groups as T
}


There might be no need to copy the numeric captures, though; I just made them be copied because it seemed to make sense to put the matched substring in 0 instead of moving to a separate argument.
I've overall problem with RexExp definition and definition of objects and arrays. From my point of view, allowing something like:
const x: {[x: string]: string} = {}
const y = x['foo'] // <= y is a string here
console.log(y.length)
> Uncaught TypeError: Cannot read property 'length' of undefined
Same for arrays, but, well, this one is very surprising:
const a: string[] = []
const b = a[0] // <= string - why why why?
const c = a.pop() // <= string | undefined
// and other way:
const a: [string] = ['foo']
const b = a[0] // <= string
const c = a.pop() // <= string | undefined - why why why? TS can infer from `if`, but not here?
is a big misconception in sake of convenience. This leads the one of greatest type system ad absurdum. But I'm sure, the core team has another opinion on that, unfortunately.
Based on above statements the definition for RegExpMatch* isn't helpful:
interface RegExpMatchArray {
groups?: {
[key: string]: string
}
}
interface RegExpExecArray {
groups?: {
[key: string]: string
}
}
Infer types from regular expression is possible (from my point of view), but very complex. Instead of that I would like to see more developer support to make it type safe (pseudo code):
type RegExpMatch = {
[key: number]: string | undefined,
groups?: {
[key: string]: string | undefined
}
}
interface RegExp<T extends RegExpMatch> {
exec(string: string): T | null;
}
To make it more type safe:
const regexp = new RegExp<{0: string, {groups: {foo: string}}}>('/^\/(?<foo[^/]+)$/')
const result = regexp.exec('/bar')
if (result !== null) {
// now you get the typings here
result[0] // <= string
result[1] // <= string | undefined (or may be never?)
result.groups.foo // <= string
result.groups.test // <= string | undefined (or may be never?)
}
If developer makes a mistake in typings, well, that's OK. But better as allow everything.
A little bit related: https://github.com/Microsoft/TypeScript/issues/6579
I think it'd be great to implement this alongside #38671, so that generic regexes keep their current typing, but regex literals have strongly typed capturing groups.
const re1 = /(?<year>[0-9]{4})-(?<month>[0-9]{2})/;
type Groups1 = ReturnType<typeof re1.exec>['groups']; // Remains Record<string, string>
const re2 = /(?<year>[0-9]{4})-(?<month>[0-9]{2})/ as const;
type Groups2 = ReturnType<typeof re2.exec>['groups']; // Would be { year: string, month: string }
And generalize them so that:
type hasYearAndMonth<T extends Regex> = T extends Regex<'year'|'month'> ? true : false;
const re1 = /(?<year>[0-9]{4})/ as const;
const re2 = /(?<year>[0-9]{4})-(?<month>[0-9]{2})/ as const;
type T1 = hasYearAndMonth<typeof re1>; // false
type T2 = hasYearAndMonth<typeof re2>; // true
Most helpful comment
It would be kinda neat with this to make types like
RegExp</(?<year>[0-9]{4})-(?<month>[0-9]{2})/, { year: /[0-9]{4}/, month: /[0-9]{2}/ }>.