Typescript: Suggestion: Regex-validated string type

Created on 22 Jan 2016  ·  146Comments  ·  Source: microsoft/TypeScript

There are cases, where a property can not just be any string (or a set of strings), but needs to match a pattern.

let fontStyle: 'normal' | 'italic' = 'normal'; // already available in master
let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i = '#000'; // my suggestion

It's common practice in JavaScript to store color values in css notation, such as in the css style reflection of DOM nodes or various 3rd party libraries.

What do you think?

Literal Types Needs Proposal Suggestion

Most helpful comment

Design Proposal

There are a lot of cases when developers need more specified value then just a string, but can't enumerate them as union of simple string literals e.g. css colors, emails, phone numbers, ZipCode, swagger extensions etc. Even json schema specification which commonly used for describing schema of JSON object has pattern and patternProperties that in terms of TS type system could be called regex-validated string type and regex-validated string type of index.

Goals

Provide developers with type system that is one step closer to JSON Schema, that commonly used by them and also prevent them from forgetting about string validation checks when needed.

Syntactic overview

Implementation of this feature consists of 4 parts:

Regex validated type

type CssColor = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i;
type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;

Regex-validated variable type

let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;

and the same, but more readable

let fontColor: CssColor;

Regex-validated variable type of index

interface UsersCollection {
    [email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i]: User;
}

and the same, but more readable

interface UsersCollection {
    [email: Email]: User;
}

Type guard for variable type

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color)) {
        fontColor = color;// correct
    }
}

and same

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (!(/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color))) return;
    fontColor = color;// correct
}

and using defined type for better readability

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (CssColor.test(color)) {
        fontColor = color;// correct
    }
}

same as

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (!(CssColor.test(color))) return;
    fontColor = color;// correct
}

Type gurard for index type

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email)) {
        collection[email];// type is User
    }
}

same as

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (!(/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email))) return;
    collection[email];// type is User
}

and using defined type for better readability

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
    }
}

same as

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (!(Email.test(email))) return;
    collection[email];// type is User
}

Semantic overview

Assignments

let email: Email;
let gmail: Gmail;
email = '[email protected]';// correct
email = '[email protected]';// correct
gmail = '[email protected]';// compile time error
gmail = '[email protected]';// correct
gmail = email;// obviously compile time error
email = gmail;// unfortunately compile time error too

Unfortunately we can't check is one regex is subtype of another without hard performance impact due to this article. So it should be restricted. But there are next workarounds:

// explicit cast
gmail = <Gmail>email;// correct
// type guard
if (Gmail.test(email)) {
    gmail = email;// correct
}
// another regex subtype declaration
type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
gmail = email;// correct

Unfortunately assigning of string variable to regex-validated variable should also be restricted, because there is no guaranty in compile time that it will match regex.

let someEmail = '[email protected]';
let someGmail = '[email protected]';
email = someEmail;// compile time error
gmail = someGmail;// compile time error

But we are able to use explicit cast or type guards as shown here. Second is recommended.
Luckily it's not a case for string literals, because while using them we ARE able to check that its value matches regex.

let someEmail: '[email protected]' = '[email protected]';
let someGmail: '[email protected]' = '[email protected]';
email = someEmail;// correct
gmail = someGmail;// correct

Type narrowing for indexes

For simple cases of regex-validated type of index see Type gurard for index type.
But there could be more complicated cases:

type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i;
type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
interface UsersCollection {
    [email: Email]: User;
    [gmail: Gmail]: GmailUser;
}
let collection: UsersCollection;
let someEmail = '[email protected]';
let someGmail = '[email protected]';
collection['[email protected]'];// type is User
collection['[email protected]'];// type is User & GmailUser
collection[someEmail];// unfortunately type is any
collection[someGmail];// unfortunately type is any
// explicit cast is still an unsafe workaround
collection[<Email> someEmail];// type is User
collection[<Gmail> someGmail];// type is GmailUser
collection[<Email & Gmail> someGmail];// type is User & GmailUser

Literals haven't such problem:

let collection: UsersCollection;
let someEmail: '[email protected]' = '[email protected]';
let someGmail: '[email protected]' = '[email protected]';
collection[someEmail];// type is User
collection[someGmail];// type is User & GmailUser

But for variables the best option is using type guards as in next more realistic examples:

getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
        if (Gmail.test(email)) {
            collection[email];// type is User & GmailUser
        }
    }
    if (Gmail.test(email)) {
        collection[email];// type is GmailUser
    }
}

But if we'll use better definition for Gmail type it would have another type narrowing:

type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
        if (Gmail.test(email)) {
            collection[email];// type is User & GmailUser
        }
    }
    if (Gmail.test(email)) {
        collection[email];// type is User & GmailUser
    }
}

Unions and intersections

Actually common types and regex-validated types are really different, so we need rules how correclty handle their unions and intersections.

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;// correct
type test_2 = Regex_1 & Regex_2;// correct
type test_3 = Regex_1 | NonRegex;// correct
type test_4 = Regex_1 & NonRegex;// compile time error
if (test_1.test(something)) {
    something;// type is test_1
    // something matches Regex_1 OR Regex_2
}
if (test_2.test(something)) {
    something;// type is test_2
    // something matches Regex_1 AND Regex_2
}
if (test_3.test(something)) {
    something;// type is Regex_1
} else {
    something;// type is NonRegex
}

Generics

There are no special cases for generics, so regex-validated type could be used with generics in same way as usual types.
For generics with constraints like below, regex-validated type behaves like string:

class Something<T extends String> { ... }
let something = new Something<Email>();// correct

Emit overview

Unlike usual types, regex-validated have some impact on emit:

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;
type test_2 = Regex_1 & Regex_2;
type test_3 = Regex_1 | NonRegex;
type test_4 = Regex_1 & NonRegex;
if (test_1.test(something)) {
    /* ... */
}
if (test_2.test(something)) {
    /* ... */
}
if (test_3.test(something)) {
    /* ... */
} else {
    /* ... */
}

will compile to:

var Regex_1 = / ... /;
var Regex_2 = / ... /;
if (Regex_1.test(something) || Regex_2.test(something)) {
    /* ... */
}
if (Regex_1.test(something) && Regex_2.test(something)) {
    /* ... */
}
if (Regex_1.test(something)) {
    /* ... */
} else {
    /* ... */
}

Compatibility overview

This feature has no issues with compatibility, because there only case that could break it and it is related to that regex-validated type has emit impact unlike usual type, so this is valid TS code:

type someType = { ... };
var someType = { ... };

when code below is not:

type someRegex = / ... /;
var someRegex = { ... };

But second already WAS invalid, but due to another reason (type declaration was wrong).
So now we have to restrict declaring of variable with name same to type, in case when this type is regex-validated.

P.S.

Feel free to point on things that I probably have missed. If you like this proposal, I could try to create tests that covers it and add them as PR.

All 146 comments

Yeah, I've seen this combing through DefinitelyTyped, . Even we could use something like this with ScriptElementKind in the services layer, where we'd ideally be able to describe these as a comma-separated list of specific strings.

The main problems are:

  • It's not clear how to compose these well. If I want a comma-separated list of "cat", "dog", and "fish", then I need to write something like /dog|cat|fish(,(dog|cat|fish))*/.

    • If I already have types describing string literal types for "cat", "dog", and "fish", how do I integrate them into this regex?

    • Clearly there's repetition here, which is undesirable. Perhaps fixing the previous issue would make this easier.

  • Non-standard extensions make this sort of iffy.

Huge +1 on this, ZipCode, SSN, ONet, many other use cases for this.

I faced the same problem, and I see that it is not implemented yet, maybe this workaround will be helpful:
http://stackoverflow.com/questions/37144672/guid-uuid-type-in-typescript

As @mhegazy suggested I will put my sugggestion (#8665) here. What about allow simple validation functions in type declarations? Something like that:

type Integer(n:number) => String(n).macth(/^[0-9]+$/)
let x:Integer = 3 //OK
let y:Integer = 3.6 //wrong

type ColorLevel(n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0}   //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong

type Hex(n:string) => n.match(/^([0-9]|[A-F])+$/)
let hexValue:Hex = "F6A5" //OK
let wrongHexValue:Hex = "F6AZ5" //wrong

The value that the type can accept would be determined by the function parameter type and by the function evaluation itself. That would solve #7982 also.

@rylphs +1 this would make TypeScript extremely powerful

How does subtyping work with _regex-validated string types_?

let a: RegExType_1
let b: RegExType_2

a = b // Is this allowed? Is RegExType_2 subtype of RegExType_1?
b = a // Is this allowed? Is RegExType_1 subtype of RegExType_2?

where RegExType_1 and RegExType_2 are _regex-validated string types_.

Edit: It looks like this problem is solvable in polynomial time (see The Inclusion Problem for Regular Expressions).

Would also help with TypeStyle : https://github.com/typestyle/typestyle/issues/5 :rose:

In JSX, @RyanCavanaugh and I've seen people add aria- (and potentially data-) attributes. Someone actually added a string index signature in DefinitelyTyped as a catch-all. A new index signature for this would have be helpful.

interface IntrinsicElements {
    // ....
    [attributeName: /aria-\w+/]: number | string | boolean;
}

Design Proposal

There are a lot of cases when developers need more specified value then just a string, but can't enumerate them as union of simple string literals e.g. css colors, emails, phone numbers, ZipCode, swagger extensions etc. Even json schema specification which commonly used for describing schema of JSON object has pattern and patternProperties that in terms of TS type system could be called regex-validated string type and regex-validated string type of index.

Goals

Provide developers with type system that is one step closer to JSON Schema, that commonly used by them and also prevent them from forgetting about string validation checks when needed.

Syntactic overview

Implementation of this feature consists of 4 parts:

Regex validated type

type CssColor = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i;
type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;

Regex-validated variable type

let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;

and the same, but more readable

let fontColor: CssColor;

Regex-validated variable type of index

interface UsersCollection {
    [email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i]: User;
}

and the same, but more readable

interface UsersCollection {
    [email: Email]: User;
}

Type guard for variable type

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color)) {
        fontColor = color;// correct
    }
}

and same

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (!(/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(color))) return;
    fontColor = color;// correct
}

and using defined type for better readability

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (CssColor.test(color)) {
        fontColor = color;// correct
    }
}

same as

setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (!(CssColor.test(color))) return;
    fontColor = color;// correct
}

Type gurard for index type

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email)) {
        collection[email];// type is User
    }
}

same as

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (!(/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(email))) return;
    collection[email];// type is User
}

and using defined type for better readability

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
    }
}

same as

let collection: UsersCollection;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (!(Email.test(email))) return;
    collection[email];// type is User
}

Semantic overview

Assignments

let email: Email;
let gmail: Gmail;
email = '[email protected]';// correct
email = '[email protected]';// correct
gmail = '[email protected]';// compile time error
gmail = '[email protected]';// correct
gmail = email;// obviously compile time error
email = gmail;// unfortunately compile time error too

Unfortunately we can't check is one regex is subtype of another without hard performance impact due to this article. So it should be restricted. But there are next workarounds:

// explicit cast
gmail = <Gmail>email;// correct
// type guard
if (Gmail.test(email)) {
    gmail = email;// correct
}
// another regex subtype declaration
type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
gmail = email;// correct

Unfortunately assigning of string variable to regex-validated variable should also be restricted, because there is no guaranty in compile time that it will match regex.

let someEmail = '[email protected]';
let someGmail = '[email protected]';
email = someEmail;// compile time error
gmail = someGmail;// compile time error

But we are able to use explicit cast or type guards as shown here. Second is recommended.
Luckily it's not a case for string literals, because while using them we ARE able to check that its value matches regex.

let someEmail: '[email protected]' = '[email protected]';
let someGmail: '[email protected]' = '[email protected]';
email = someEmail;// correct
gmail = someGmail;// correct

Type narrowing for indexes

For simple cases of regex-validated type of index see Type gurard for index type.
But there could be more complicated cases:

type Email = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i;
type Gmail = /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
interface UsersCollection {
    [email: Email]: User;
    [gmail: Gmail]: GmailUser;
}
let collection: UsersCollection;
let someEmail = '[email protected]';
let someGmail = '[email protected]';
collection['[email protected]'];// type is User
collection['[email protected]'];// type is User & GmailUser
collection[someEmail];// unfortunately type is any
collection[someGmail];// unfortunately type is any
// explicit cast is still an unsafe workaround
collection[<Email> someEmail];// type is User
collection[<Gmail> someGmail];// type is GmailUser
collection[<Email & Gmail> someGmail];// type is User & GmailUser

Literals haven't such problem:

let collection: UsersCollection;
let someEmail: '[email protected]' = '[email protected]';
let someGmail: '[email protected]' = '[email protected]';
collection[someEmail];// type is User
collection[someGmail];// type is User & GmailUser

But for variables the best option is using type guards as in next more realistic examples:

getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
        if (Gmail.test(email)) {
            collection[email];// type is User & GmailUser
        }
    }
    if (Gmail.test(email)) {
        collection[email];// type is GmailUser
    }
}

But if we'll use better definition for Gmail type it would have another type narrowing:

type Gmail = Email & /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@gmail\.com$/i;
getUserByEmail(email: string) {
    collection[email];// type is any
    if (Email.test(email)) {
        collection[email];// type is User
        if (Gmail.test(email)) {
            collection[email];// type is User & GmailUser
        }
    }
    if (Gmail.test(email)) {
        collection[email];// type is User & GmailUser
    }
}

Unions and intersections

Actually common types and regex-validated types are really different, so we need rules how correclty handle their unions and intersections.

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;// correct
type test_2 = Regex_1 & Regex_2;// correct
type test_3 = Regex_1 | NonRegex;// correct
type test_4 = Regex_1 & NonRegex;// compile time error
if (test_1.test(something)) {
    something;// type is test_1
    // something matches Regex_1 OR Regex_2
}
if (test_2.test(something)) {
    something;// type is test_2
    // something matches Regex_1 AND Regex_2
}
if (test_3.test(something)) {
    something;// type is Regex_1
} else {
    something;// type is NonRegex
}

Generics

There are no special cases for generics, so regex-validated type could be used with generics in same way as usual types.
For generics with constraints like below, regex-validated type behaves like string:

class Something<T extends String> { ... }
let something = new Something<Email>();// correct

Emit overview

Unlike usual types, regex-validated have some impact on emit:

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_1 = Regex_1 | Regex_2;
type test_2 = Regex_1 & Regex_2;
type test_3 = Regex_1 | NonRegex;
type test_4 = Regex_1 & NonRegex;
if (test_1.test(something)) {
    /* ... */
}
if (test_2.test(something)) {
    /* ... */
}
if (test_3.test(something)) {
    /* ... */
} else {
    /* ... */
}

will compile to:

var Regex_1 = / ... /;
var Regex_2 = / ... /;
if (Regex_1.test(something) || Regex_2.test(something)) {
    /* ... */
}
if (Regex_1.test(something) && Regex_2.test(something)) {
    /* ... */
}
if (Regex_1.test(something)) {
    /* ... */
} else {
    /* ... */
}

Compatibility overview

This feature has no issues with compatibility, because there only case that could break it and it is related to that regex-validated type has emit impact unlike usual type, so this is valid TS code:

type someType = { ... };
var someType = { ... };

when code below is not:

type someRegex = / ... /;
var someRegex = { ... };

But second already WAS invalid, but due to another reason (type declaration was wrong).
So now we have to restrict declaring of variable with name same to type, in case when this type is regex-validated.

P.S.

Feel free to point on things that I probably have missed. If you like this proposal, I could try to create tests that covers it and add them as PR.

I've forgotten to point to some cases for intersections and unions of regex-validated types, but I've described them in latest test case. Should I update Design proposal to reflect that minor change?

@Igmat, question about your design proposal: Could you elaborate on the emit overview? Why would regex-validated types need to be emitted? As far as I can tell, other types don't support runtime checks... am I missing something?

@alexanderbird, yes, any other type have no impact on emit. At first, I thought that regex-validated will do so as well, so I've started creating the proposal and playing with proposed syntax.
First approach was like this:

let fontColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
fontColor = "#000";

and this:

type CssColor: /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
let fontColor: CssColor;
fontColor = "#000";

It's ok and has no need for emit changes, because "#000" could be checked in compile time.
But we also have to handle narrowing from string to regex-validated type in order to make it useful. So I've thought about this for both previous setups:

let someString: string;
if (/^#([0-9a-f]{3}|[0-9a-f]{6})$/i.test(someString)) {
    fontColor = someString; // Ok
}
fontColor = someString; // compile time error

So it also has no impact on emit and looks ok, except that regex isn't very readable and have to be copied in all places, so user could easily make a mistake. But in this particular case it still seems to be better than changing how type works.
But then I realized that this stuff:

let someString: string;
let email: /^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/I;
if (/^[-a-z0-9~!$%^&*_=+}{\'?]+(\.[-a-z0-9~!$%^&*_=+}{\'?]+)*@([a-z0-9_][-a-z0-9_]*(\.[-a-z0-9_]+[a-z][a-z])|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}))(:[0-9]{1,5})?$/i.test(someString)) {
    email = someString; // Ok
}
email = someString; // compile time error

is a nightmare. And it's even without intersections and unions. So to avoid happening of stuff like this, we have to slightly change type emit as shown in proposal.

@DanielRosenwasser, could you, please, provide some feedback for this proposal? And also for tests referenced here, if possible?
I really want to help with implementing of this feature, but it requires a lot of time (tsc is really complicated project and I still have to work on understanding of how it works inside) and I don't know is this proposal is ready to implement or you will reject this feature implemented in this way due to another language design vision or any other reason.

Hey @Igmat, I think there are a few things I should have initially asked about

To start, I still don't understand why you need any sort of change to emit, and I don't think any sort of emit based on types would be acceptable. Check out our non-goals here.

Another issue I should have brought up is the problem of regular expressions that use backreferences. My understanding (and experience) is that backreferences in a regular expression can force a test to run in time exponential to its input. Is this a corner case? Perhaps, but it's something I'd prefer to avoid in general. This is especially important given that in editor scenarios, a type-check at a location should take a minimal amount of time.

Another issue is that we'd need to either rely on the engine that the TypeScript compiler runs on, or build a custom regular expression engine to execute these things. For instance, TC39 is moving to include a new s flag so that . can match newlines. There would be a discrepancy between ESXXXX and older runtimes that support this.

@igmat - there is no question in my mind that having regexes emitted at runtime would be useful. However, I don't think they're necessary for this feature to be useful (and from the sounds of what @DanielRosenwasser has said, it probably wouldn't get approved anyway). You said

But we also have to handle narrowing from string to regex-validated type in order to make it useful

I think this is only the case if we are to narrow from a dynamic string to a regex-validated type. This gets very complicated. Even in this simple case:

function foo(bar: number) {
    let baz: /prefix:\d+/ = 'prefix:' + number;
}

We can't be sure that the types will match - what if the number is negative? And as the regexes get more complicated, it just gets messier and messier. If we really wanted this, maybe we allow "type interpolation: type Baz = /prefix:{number}/... but I don't know if it's worth going there.

Instead, we could get partway to the goal if we only allowed string literals to be assigned to regex-validated types.

Consider the following:

type Color = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
let foo: Color = '#000000';
let bar: Color = '#0000'; // Error - string literal '#0000' is not assignable to type 'Color'; '#0000' does not match /^#([0-9a-f]{3}|[0-9a-f]{6})$/i
let baz: Color = '#' + config.userColorChoice; // Error - type 'string' is not assignable to type 'regex-validated-string'

Do you think that's a workable alternative?

@DanielRosenwasser, I've read Design Goals carefully and, if I understand you correctly, problem is violation of Non-goals#5.
But it doesn't seem to me as violation, but as syntax improvement. For example, previously we had:

const emailRegex = /.../;
/**
 * assign it only with values tested to emailRegex 
 */
let email: string;
let userInput: string;
// somehow get user input
if (emailRegex.test(userInput)) {
    email = userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

With this proposal implemented it would look like:

type Email = /.../;
let email: Email;
let userInput: string;
// somehow get user input
if (Email.test(userInput)) {
    email = userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

As you see, code is almost the same - it's a common simple usage of regex. But second case is much more expressive and will prevent user from accidental mistake, like forgetting to check string before assignment it to variable that meant to be regex-validated.
Second thing is that without such type narrowing we won't be able to normally use regex-validated type in indexes, because in most cases such index fields works with some variable that can't be checked in runtime as it could be done with literals.

@alexanderbird, I don't suggest making this code valid or add some hidden checks in both runtime and compile time.

function foo(bar: number) {
    let baz: /prefix:\d+/ = 'prefix:' + number;
}

This code have to throw error due to my proposal. But this:

function foo(bar: number) {
    let baz: /prefix:\d+/ = ('prefix:' + number) as /prefix:\d+/;
}

or this:

function foo(bar: number) {
    let baz: /prefix:\d+/;
    let possibleBaz: string = 'prefix:' + number;
    if (/prefix:\d+/.test(possibleBaz)) {
        baz = possibleBaz;
    }
}

would be correct, and even have no impact to emitted code.

And as I showed in previous comment, literals would be definitely not enough even for common use cases, because we often have to work with stings from user input or other sources. Without implementing of this emit impact, users would have to work with this type in next way:

export type Email = /.../;
export const Email = /.../;
let email: Email;
let userInput: string;
// somehow get user input
if (Email.test(userInput)) {
    email = <Email>userInput;
} else {
    console.log('User provided invalid email. Showing validation error');
    // Some code for validation error
}

or for intersections:

export type Email = /email-regex/;
export const Email = /email-regex/;
export type Gmail = Email & /gmail-regex/;
export const Gmail = {
    test: (input: string) => Email.test(input) && /gmail-regex/.test(input)
};
let gmail: Gmail;
let userInput: string;
// somehow get user input
if (Gmail.test(userInput)) {
    gmail = <Gmail>userInput;
} else {
    console.log('User provided invalid gmail. Showing validation error');
    // Some code for validation error
}

I don't think that forcing users to duplicate code and to use explicit cast, when it could be easily handled by compiler isn't a good way to go. Emit impact is really very small and predictable, I'm sure that it won't surprise users or lead to some feature misunderstood or hard to locate bugs, while implementing this feature without emit changes definitely WILL.

In conclusion I want to say that in simple terms regex-validated type is both a scoped variable and a compiler type.

@DanielRosenwasser and @alexanderbird ok, I have one more idea for that. What about syntax like this:

const type Email = /email-regex/;

In this case user have to explicitly define that he/she want this as both type and const, so actual type system has no emit changes unless it used with such modifier. But if it used with it we are still able to avoid a lot of mistakes, casts and duplication of code by adding same emit as for:

const Email = /email-regex/;

This seems to be even bigger than just improvement for this proposal, because this probably could allow something like this (example is from project with Redux):

export type SOME_ACTION = 'SOME_ACTION';
export const SOME_ACTION = 'SOME_ACTION' as SOME_ACTION;

being converted to

export const type SOME_ACTION = 'SOME_ACTION';

I've tried to found some similar suggestion but wasn't successful. If it could be a workaround and if you like such idea, I can prepare Design Proposal and tests for it.

@DanielRosenwasser, about your second issue - I don't think that it would ever happen, because in my suggestion compiler runs regex only for literals and it doesn't seems that someone will do something like this:

let something: /some-regex-with-backreferences/ = `
long enough string to make regex.test significantly affect performance
`

Anyway we could test how long literal should be for affecting real-time performance and create some heuristic that will warn user if we are unable to check it while he faces this circumstances in some editor scenarios, but we would check it when he will compile the project. Or there could be some other workarounds.

About third question, I'm not sure that understand everything correctly, but it seems that regex engine should be selected depending on target from tsconfig if they have different implementations. Needs some more investigation.

@DanielRosenwasser are there any thoughts? 😄 About initial proposal and about last one. May be I have to make more detailed overview of second one, do I?

@Igmat Your proposal limits the validation to only be useful with string types. What are your thoughts on @rylphs proposal? This would allow a more generic validation for all primitive types:

type ColorLevel = (n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0}   //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong

I suspect however that extending this mechanism beyond primitives to non-primitive types would be too much.
One point, the issue that @DanielRosenwasser raised -- about varying regex engine implementations -- would be magnified: depending on the Javascript engine under which the Typescript compiler is running, the validation function might work differently.

@zspitz it looks promising but in my opinion it could affect compiler performance too much, because function isn't limited by any rules and it will force TS to calculate some expressions that are too complicated or even depend on some resources that doesn't available in compile time.

@Igmat

because function isn't limited by any rules

Do you have some specific examples in mind? Perhaps it's possible to limit the validation syntax to a "safe"/compile-time-known subset of Typescript.

how about making user-defined type guards defines new type?

// type guard that introduces new nominal type int
function isInt(value: number): value is type int { return /^\d+$/.test(value.toString()); }
// -------------------------------------^^^^ add type keyword here
function printNum(value: number) { console.log(value); }
function printInt(value: int) { console.log(value); }
const num = 123;
printNum(num); // ok
printInt(num); // error
if (isInt(num)) {
    printNum(num); // ok
    printInt(num); // ok
}

@disjukr looks nice, but what about type extending?

Do they absolutely need to be extendable? Is there a TypeScript design principle which demands it? If not, I would rather have the nominal types @disjukr suggests than nothing, even though they're not extendable.

We'd need something pretty creative to get extendability IMHO - we can't determine whether one arbitrary type guard (function) is a subset of another arbitrary type guard.

We could get rudimentary "extendability" using a type assertion mentality (I'm not saying this is something "pretty creative" - I'm saying here's a stop-gap until someone comes up with something pretty creative):

function isInt(value: number): value is type int { return /^\d+$/.test(value.toString()); }
// assert that biggerInt extends int. No compiler or runtime check that it actually does extend.
function isBiggerInt(value: number): value is type biggerInt extends int { return /^\d{6,}$/.test(value.toString()); }
// -----------------------------------------------------------^^^^ type extension assertion
function printNum(value: number) { console.log(value); }
function printInt(value: int) { console.log(value); }
function printBiggerInt(value: biggerInt) {console.log(value); }

const num = 123;
printNum(num); // ok
printInt(num); // error
printBiggerInt(num); // error
if (isInt(num)) {
    printNum(num); // ok
    printInt(num); // ok
    printBiggerInt(num); // error
}
if (isBiggerInt(num)) {
    printNum(num); // ok
    printInt(num); // ok
    printBiggerInt(num); // ok
}

Which might be useful even though it's not sound. But as I said at the start, do we require that it is extendable, or can we implement it as suggested by @disjukr? (If the latter, I suggest we implement it in @disjukr's non-extendable way.)

Kind of offtopic, reply to @DanielRosenwasser first comment:
For comma-separated list you have to use ^ and $ anchors (it's relevant in the most cases when you want to validate some string). And anchors help to avoid repetition, for your example the regexp will be /^((dog|cat|fish)(,|$))+$/

Allow string types to be regular expressions /#[0-9]{6}/ and allow nesting types into regular expressions ${TColor}:

type TColor = 'red' | 'blue' | /#[0-9]{6}/;
type TBorderValue = /[0-9]+px (solid|dashed) ${TColor}/

Result:

let border1: TBorderValue = '1px solid red'; // OK
let border2: TBorderValue = '1px solid yellow'; // TSError: .....

Use case: there is a library dedicated to writing "type-safe" CSS styles in TypeScript typestyle. The above proposed functionality would help greatly, because the library has to expose methods to be used at runtime, the proposed string regex types would instead be able type check code at compile time and give developers great intellisense.

@DanielRosenwasser @alexanderbird @Igmat: IMO this proposal would be game-changing for TypeScript and web development. What is currently stopping it from being implemented?

I agree that extendability and type emission should not get in the way of the rest of the feature. If there's not a clear path on those aspects, implement them later when there is.

I arrived here as I am looking to have a UUID type and not a string, hence having a regex defining the string would be awesome in this case + a way to check validity of the type (Email.test example) would be also helpful.

@skbergam I'm trying to implement it by myself once more. But TS project is really huge and I also have a work, so there are nearly no progress (I've managed only to create tests for this new feature). If somebody has more experience with extending TS any help would be greatly appreciated...

Interesting that this effectively creates a nominal type, since we'd be unable to establish any subtype/assignability relationships between any two non-identical regexps

@RyanCavanaugh Earlier @maiermic commented

Edit: It looks like this problem is solvable in polynomial time (see The Inclusion Problem for Regular Expressions).

But that might not be good enough? One certainly hopes there is not to many regexp relationships, but you never know.

Regarding type checks, if we don't like duplicating regexps, and typeof a const isn't good enough (e.g. .d.ts files), how does TS feel about valueof e, which emits the literal value of e iff e is a literal, otherwise an error (and emits something like undefined)?

@maxlk Also off-topic but I took your regex and improved it to not match trailing commas on otherwise valid input: /^((dog|cat|fish)(,(?=\b)|$))+$/ with test https://regex101.com/r/AuyP3g/1. This uses a positive lookahead for a word character after the comma, forcing the prior to revalidate in a DRY way.

Hi!
What's the status of this?
Will you add this feature in the near future? Can't find anything about this in the roadmap.

@lgmat How about limiting the syntax to single-line arrow functions, using only definitions available in lib.d.ts?

Are those awesome improvements available? Maybe in alpha release at least?

regex validated types are great for writing tests, validating hardcoded inputs would be great.

+1. Our use case is very common, we need to have a string date format like 'dd/mm/YYYY'.

although as proposed it would be an extremely cool feature, it lacks the potential:

  • the output is always a sting (although compile time checked), there is no way to get a structured object
  • grammar is limited to what regexp's can do, and they cannot do much, the problem of regular expressions is that they are regular, their result is a list, not a tree, they cannot express arbitrary long nested levels
  • the parser has to be expressed in terms of typescript grammar, which is limited and not extentable

better way would be outsource parsing and emitting to a plugin, like proposed in #21861, this way all of the above is not a problem at the price of steeper learning curve, but hey! the regexp checking can be implemented atop of that so that the original proposal still stands, coming up by more advanced machinery

so as i said, a more general way would be custom syntax providers for whatever literals: #21861

examples:

const uri: via URIParserAndEmitter = http://google.com; 
console.log(uri); // --> { protocol: 'http', host: 'google.com', path: undefined, query: undefined, hash: undefined }

const a: via PositiveNumberParser = 10; // --> 10
const b: via PositiveNumberParser = -10; // --> error

const date: via DateParser = 1/1/2019; // --> new Date(2019, 1, 1)


@lgmat How about limiting the syntax to single-line arrow functions, using only definitions available in lib.d.ts?

@zspitz that would make a lot of people unhappy, as they would see, that it is possible, but forbined for them, basically for their safety.

Are those awesome improvements available? Maybe in alpha release at least?

As far as I know this still need a proposal. @gtamas, @AndrewEastwood

Also I think #11152 would be affecting this.

@Igmat Your proposal limits the validation to only be useful with string types. What are your thoughts on @rylphs proposal? This would allow a more generic validation for all primitive types:

type ColorLevel = (n:number) => n>0 && n<= 255
type RGB = {red:ColorLevel, green:ColorLevel, blue:ColorLevel};
let redColor:RGB = {red:255, green:0, blue:0}   //OK
let wrongColor:RGB = {red:255, green:900, blue:0} //wrong

I suspect however that extending this mechanism beyond primitives to non-primitive types would be too much.

The main problem I see with this is security concerns, imagine some malicious code, that would use buffers to grab the user's memory while checking for type. We would have to implement a lot of sandboxing around this. I would rather see 2 different solutions, one for strings and one for numbers.

RegExp is immune to that to some extends as the only way you can use this maliciously is to make some backtracking expression. That being said, some users might do it unintentionally, therefore, there should be some kind of protection. I would think the best way to do it would be a timer.

One point, the issue that @DanielRosenwasser raised -- about varying regex engine implementations -- would be magnified: depending on the Javascript engine under which the Typescript compiler is running, the validation function might work differently.

That is true, this is bad, but we can solve it by specifying what "modern" part of regExp we need for our codebase. It would default to normal (is it ES3?) regexp, that works in every node. And option to enable new flags and lookbehind assertions.

const unicodeMatcher = /\u{1d306}/u;
let value: typeof unicodeMatcher;
function(input: string) {
  value = input;  // Invalid
  if (input.match(unicodeMatcher)) {
    value = input;  // OK
  }
}

If a user has disabled flag with advanced flags.

let value: typeof unicodeMatcher = '𝌆';  // Warning, string literal isn't checked, because `variable` is of type `/\u{1d306}/u`.

TypeScript would not evaluate advanced RegExp, if not told to. But I would suggest that is should give warning, explaining what is happening and how to enable advanced RegExp checking.

If user has enabled flag with advanced flags and his node supports it.

let value: typeof unicodeMatcher = '𝌆';  // OK

If a user has enabled flag with advanced flags and his node supports it.

let value: typeof unicodeMatcher = '𝌆';  
// Error, NodeJS does not support advanced RegExp, upgrade NodeJS to version X.Y.Z, or disable advanced RegExp checking.

I think this is a reasonable way to go about it.
Teams of programmers have usually the same version of NodeJS or are easily able to upgrade since all their codebase is working for someone with a newer version.
Solo programmers can adapt easily on the fly,

What's the current status of this issue? It's really a pitty to see that TypeScript has such huge potential and dozens of awesome proposals, but they don't get much attention from the developers…

AFAIK the original proposal was good apart from the Emit overview which is a no-go and not really needed, so it shouldn't be blocking the proposal.

The issue it's trying to address could be solved by the introduction of regex literals (which shouldn't be hard, as they're effectively equivalent to string and number literals) and a type operator patternof(similar to typeof and keyof), which would take a regex literal type and return a _validated string_ type. This is how it could be used:

type letterExpression = /[a-zA-Z]/;
let exp: letterExpression;
exp = /[a-zA-Z]/; // works
exp = /[A-Za-z]/; // error, the expressions do not match

type letter = patternof letterExpression;
type letter = patternof /[a-zA-Z]/; // this is equivalent

let a: letter;
a = 'f'; // works
a = '0'; // error
const email = /some-long-email-regex/;
type email = patternof typeof email;

declare let str: string;
if (str.match(email)) {
  str // typeof str === email
} else {
  str // typeof str === string
}

@m93a I didn't thought about such solution with additional type operator, when was working on initial proposal.

I like this approach of removing emit impact caused by types, even though this seems to be more verbose.

And this lead me to idea how to extended this proposal in order to both skip adding new keyword (as you suggest) - IMO we already have pretty big amount of them and do not have emit impact from type system (as in my proposal).

It'll take 4 steps:

  1. add regexp-validated string literal type:
    TypeScript type Email = /some-long-email-regex/;
  2. Let's change RegExp interface in core lib to generic:
    TypeScript interface RegExp<T extends string = string> { test(stringToTest: string): stringToTest is T; }
  3. Change type infer for regex literals in actual code:
    TypeScript const Email = /some-long-email-regex/; // infers to `RegExp</some-long-email-regex/>`
  4. Add type helper using conditional types feature, like InstanceType:
    TypeScript type ValidatedStringType<T extends RegExp> = T extends RegExp<infer V> ? V : string;

Usage example:

const Email = /some-long-email-regex/;
type Email = ValidatedStringType<typeof Email>;

const email: Email = `[email protected]`; // correct
const email2: Email = `emexample.com`; // compile time error

let userInput: string;
if (Email.test(userInput)) {
    // `userInput` here IS of type `Email`
} else {
    // and here it is just `string`
}

@Igmat Cool. Your proposal feels more natural for TypeScript and requires less changes to the compiler, that's probably a good thing. The only advantage of my proposal was that regex literals would feel the same as string and number literals, this could be confusing for some:

let a: 'foo' = 'foo'; // works
let b: 42 = 42; // works
let c: /x/ = /x/; // error

But I think that the simplicity of your proposal outweighs the one disadventage.

Edit: I don't really like the length of ValidatedStringType<R>. If we decided to call validated strings patterns, we could use PatternOf<R> after all. I'm not saying that your type takes longer time to type, most people would just type the first three letters and hit tab. It just has larger code spagetification impact.

@Igmat Your solution is excellent from the development point, but as readability goes, it would be much better to have to the possibility as @m93a proposed. I think it could be internally represented in the much same way, but it should be presented to the user as simple as possible.

@Akxe I don't think that the devs would fancy adding another keyword that only has one very specific use case.

@RyanCavanaugh Could you please tell us your opinion on this? (Specifically the original proposal and the four last comments (excluding this one).) Thank you! :+1:

how about having generic argument for string, that defaults to .*?

let a: 'foo' = 'foo'; // works
let b: 42 = 42; // works
let c: /x/ = /x/; // works
let d: string<x.> = 'xa'; // works

string literal 'foo' can be considered a sugar for string<foo>

I don't really like the length of ValidatedStringType<R>. If we decided to call validated strings _patterns_, we could use PatternOf<R> after all.

@m93a, IMO, in this case it would be better to call them PatternType<R> to be consistent with already existing InstanceType and ReturnType helpers.

@amir-arad, interesting. How will interface RegExp look like in this case?

@RyanCavanaugh I could rewrite original proposal with newly found way if it'll help. Should I?

@amir-arad Your proposed syntax is in conflict with the rest of TypeScript. Now you can only pass types as a generic argument, not an arbitrary expression. Your proposed syntax would be extremely confusing.

Think of generic types like they're functions that take a type and return a type. The two following pieces of code are very close in both meaning and syntax:

function foo(str: string) {
  return str === 'bar' ? true : false
}

type foo<T extends string> = T extends 'bar' ? true : false;

Your new syntax is like proposing that regex in JavaScript should be written let all = String(.*) which would be an ugly abuse of the function call syntax. Therefore I don't think your proposal makes much sense.

@m93a my suggestion was for type annotations, not javascript.

@Igmat from the top of my head, how about:

interface RegExp {
    test(stringToTest: string): stringToTest is string<this>;
}

@amir-arad, sorry, I can't add more valuable details to your suggestion, but at first glance it looks like very significant change to whole TS compiler, because string is very basic primitive.

Even though I don't see any obvious problems, I think such proposal should be much more detailed and cover a lot of existing scenarios plus proper justification of its purpose.
Your proposal adds one type and changes one primitive type, while mine only adds one type.

Unfortunately, I'm not ready to dedicate a lot of time to creating proposal for such feature (also, you may noticed that not every proposal in TS has been implemented without significant delay), but if you'll work on this, I'll be glad to provide you with my feedback if needed.

If these regexp-types were real regular expressions (not Perl-like regular expressions that are not regular) we could translate them to deterministic FSM and use cartesian product construction on those to get all the conjunctions and disjunctions. Regular expressions are closed under boolean operations.

Also if string literal types were not atomic, but represented as compile-time character lists, it would allow to implement all the operators in libraries. That would only worsen performance a bit.

Edit: Fix a mistake.

Dropping in to note that Mithril could really use these, and being type-safe in the general case is nearly impossible without it. This is the case both with hyperscript and JSX syntax. (We support both.)

  • Our lifecycle hooks, oninit, oncreate, onbeforeupdate, onupdate, onbeforeremove, and onremove, have their own special prototypes.
  • Event handlers on DOM vnodes are literally anything else that starts with on, and we support both event listener functions and event listener objects (with handleEvent methods), aligning with addEventListener and removeEventListener.
  • We support keys and refs as appropriate.
  • Everything else is treated as an attribute or property, depending on their existence on the backing DOM node itself.

So with a regex-validated string type + type negation, we could do the following for DOM vnodes:

interface BaseAttributes {
    // Lifecycle attributes
    oninit(vnode: Vnode<this, Vnode<Attributes, []>>): void;
    oncreate(vnode: Vnode<this, Vnode<Attributes, []>>): void;
    onbeforeupdate(
        vnode: Vnode<this, Vnode<Attributes, []>>,
        old: Vnode<this, Vnode<Attributes, []>>
    ): void;
    onupdate(vnode: Vnode<this, Vnode<Attributes, []>>): void;
    onbeforeremove(vnode: Vnode<this, Vnode<Attributes, []>>): void | Promise<void>;
    onremove(vnode: Vnode<this, Vnode<Attributes, []>>): void;

    // Control attributes
    key: PropertyKey;
}

interface DOMAttributes extends BaseAttributes {
    // Event handlers
    [key: /^on/ & not keyof BaseAttributes]: (
        ((this: Element, ev: Event) => void | boolean) |
        {handleEvent(ev: Event): void}
    );

    // Other attributes
    [key: keyof HTMLElement & not keyof BaseAttributes & not /^on/]: any;
    [key: string & not keyof BaseAttributes & not /^on/]: string;
}

interface ComponentAttributes extends BaseAttributes {
    // Nothing else interesting unless components define them.
}

(It'd also be nice to be able to extract groups from such regexes, but I'm not going to hold my breath on that.)

Edit: Clarify a few critical details in the proposal.
Edit 2: Correct the technical bit to actually be mathematically accurate.
Edit 3: Add support for generic starring of single-character unions

Here's a concrete proposal to attempt to solve this much more feasibly: template literal types.

Also, I feel full regexps are probably not a good idea, because it should be reasonably easy to merge with other types. Maybe this might be better: template literal types.

  • `value` - This is literally equivalent to "value"
  • `value${"a" | "b"}` - This is literally equivalent to "valuea" | "valueb"
  • `value${string}` - This is functionally equivalent to the regexp /^value/, but "value", "valuea", and "valueakjsfbf aflksfief fskdf d" are all assignable to it.
  • `foo${string}bar` - This is functionally equivalent to the regexp /^foo.*bar$/, but is a little easier to normalize.
  • There can, of course, be multiple interpolations. `foo${string}bar${string}baz` is a valid template literal type.
  • Interpolations must extend string, and it must not be recursive. (The second condition is for technical reasons.)
  • A template literal type A is assignable to a template literal type B if and only if the set of strings assignable to A is a subset of the set of strings assignable to B.

In addition to the above, a special starof T type would exist, where T must consist of only single-character string literal types. string would exist as a type alias of starof (...), where ... is the union of all single UCS-2 character string literals from U+0000 to U+FFFF, including lone surrogates. This lets you define the full grammar for ES base-10 numeric literals, for instance:

type DecimalDigit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
type Decimal = `${DecimalDigit}{starof DecimalDigit}`

type Numeric = `${(
    | Decimal
    | `${Decimal}.${starof DecimalDigit}`
    | `.${Decimal}`
)}${"" | (
    | `E${Decimal}` | `E+${Decimal}` | `E-${Decimal}`
    | `e${Decimal}` | `e+${Decimal}` | `e-${Decimal}`
)}`

And likewise, certain built-in methods can be adjusted to return such types:

  • Number.prototype.toString(base?) - This can return the above Numeric type or some variant of it for statically-known bases.
  • +x, x | 0, parseInt(x), and similar - When x is known to be a Numeric as defined above, the resulting type can be inferred appropriately as a literal number type.

And finally, you can extract matched groups like so: Key extends `on${infer EventName}` ? EventTypeMap[TagName][EventName] : never. Template extraction assumes it's always working with full names, so you have to explicitly use ${string} interpolations to search for aribitrary inclusion. This is non-greedy, so `"foo.bar.baz" extends${infer T}.${infer U}? [T, U] : never returns ["foo", "bar.baz"], not ["foo.bar", "baz"].


From a technical standpoint, this is a lot more feasible to implement than raw regexps. JS regexps aren't even regular - they become context-sensitive with back-references, and they involve a lot of complexity in the form of As long as you block recursion with these, template literal types generate a single regular language each, one that aligns very closely with the underlying theory (but supports only a subset of it).

  • Empty language: ""
  • Union: "a" | "b"
  • Concatenation: `${a}${b}`
  • Kleene star (partial): starof T (T can only contain single characters and unions.)

This may make string subtyping checking a subset of the subgraph isomorphism problem worst case scenario, but there are a few big redeeming factors here:

  1. The common case by far is unions of small finite strings, something you can model with trees. This is relatively obvious to work with. (I don't recommend trying to join them as ropes, since that will complicate the above matching algorithm, but it's perfectly fine to normalize single-character unions and similar into a single split + join.)

  2. You can model the entire unified type as a directed graph, where:

    1. Starred unions of such characters are subgraphs where the parent node has edges both to each character and each child nodes of the subgraph, and each character has edges to both all other characters and all child nodes of the subgraph.
    2. The rest of the graph holds a directed tree-like structure representing all other possibilities.

    According to this Math.SE chat I was briefly in (starting approximately here), I found that this resulting graph would have both a bounded genus (i.e. with a finite number of jumps over other edges*) and, absent any starof types, a bounded degree. This means type equality reduces that to a polynomial-time problem and assuming you normalize unions, it's also not super slow as it's only somewhat faster than tree equality. I strongly suspect the general case for this entire proposal (a subset of the subgraph isomorphism problem) is also polynomial-time with reasonable coefficients. (The Wikipedia article linked above has some examples in the "Algorithms" and references sections where special casing might apply.)


  3. None of these keys are likely to be large, so most of the actual runtime cost here is amortized in practice by other things. As long as it's fast for small keys, it's good enough.


  4. All subgraphs that would be compared share at least one node: the root node. (This represents the start of the string.) So this would dramatically reduce the problem space just on its own and guarantee a polynomial time check.


And of course, intersection between such types is non-trivial, but I feel similar redeeming factors exist simply due to the above restrictions. In particular, the last restriction makes it obviously polynomial-time to do.

* Mathematically, genus is defined a bit counterintuitively for us programmers (the minimum number of holes you need to poke in a surface to draw the graph without any jumps), but a bounded genus (limited number of holes) implies a limited number of jumps.

Using this concrete proposal, here's how my example from this comment translates:

// This would work as a *full* type implementation mod implementations of `HTMLTypeMap` +
// `HTMLEventMap`
type BaseAttributes = {
    // Lifecycle attributes
    oninit(vnode: Vnode<this, Vnode<Attributes, []>>): void;
    oncreate(vnode: Vnode<this, Vnode<Attributes, []>>): void;
    onbeforeupdate(
        vnode: Vnode<this, Vnode<Attributes, []>>,
        old: Vnode<this, Vnode<Attributes, []>>
    ): void;
    onupdate(vnode: Vnode<this, Vnode<Attributes, []>>): void;
    onbeforeremove(vnode: Vnode<this, Vnode<Attributes, []>>): void | Promise<void>;
    onremove(vnode: Vnode<this, Vnode<Attributes, []>>): void;

    // Control attributes
    key: PropertyKey;
}

interface HTMLTypeMap {
    // ...
}

interface HTMLEventMap {
    // ...
}

// Just asserting a simple constraint
type _Assert<T extends true> = never;
type _Test0 = _Assert<
    keyof HTMLTypeMap[keyof HTMLTypeMap] extends `on${string}` ? false : true
>;

type EventHandler<Event> =
    ((this: Element, ev: Event) => void | boolean) |
    {handleEvent(ev: Event): void};

type Optional<T> = {[P in keyof T]?: T[P] | null | undefined | void}

type DOMAttributes<T extends keyof HTMLAttributeMap> = Optional<(
    & BaseAttributes
    & {[K in `on${keyof HTMLEventMap[T]}` & not keyof BaseAttributes]: EventHandler<(
        K extends `on${infer E}` ? HTMLEventMap[E] : never
    )>}
    & Record<
        keyof `on${string & not keyof HTMLEventMap}` & not keyof BaseAttributes,
        EventHandler<Event>
    >
    & Pick<HTMLTypeMap[T], (
        & keyof HTMLTypeMap[T]
        & not `on${string}`
        & not keyof BaseAttributes
    )>
    & Record<(
        & string
        & not keyof HTMLTypeMap[T]
        & not keyof BaseAttributes
        & not `on${string}`
    ), string | boolean>
)>;

Edit: This would also enable properly typing 90% of Lodash's _.get method and related methods using its property shorthand, like its _.property(path) method and its _.map(coll, path) shorthand. There's probably several others I'm not thinking of, too, but that's probably the biggest one I can think of. (I'm going to leave the implementation of that type as an exercise to the reader, but I can assure you it's possible with a combination of that and the usual trick of conditional types with an immediately-indexed record, something like {0: ..., 1: ...}[Path extends "" ? 0 : 1], to process the static path string.)

My recommendation is that we focus our efforts on implementing type providers, which could be used to implement regex types.

Why type providers instead of directly implementing regex types? Because

  1. It’s a more generic solution that adds many new possibilities to TypeScript making it easier to get support from a wider group of developers beyond those who see the value in regex string types.
  2. The typescript repo owners seem to be open to this idea, and are waiting for the right proposal. See #3136

F# has an open source regex type provider.

Some info on type providers: https://link.medium.com/0wS7vgaDQV

One could imagine that once type providers are implemented and the regex type provider is implemented as an open source library, one would use it like so:

type PhoneNumber = RegexProvider</^\d{3}-\d{3}-\d{4}$/>
const acceptableNumber: PhoneNumber = "123-456-7890"; //  no compiler error
const unacceptableNumber: PhoneNumber = "hello world"; // compiler error

@AlexLeung I'm not convinced that's the correct way to go, at least not for this request.

  • TypeScript is structurally typed, not nominally typed, and for string literal manipulation, I want to retain that structural spirit. Type providers like that would create a nominal string subtype where RegexProvider</^foo$/> would not be treated as equivalent to "foo", but a nominal subtype of it. Furthermore, RegexProvider</^foo$/> and RegexProvider</^fo{2}$/> would be treated as two distinct types, and that's something I'm not a fan of. My proposal instead directly integrates with strings at their core, directly informed by the theory of formal language recognition to ensure it fits in naturally.
  • With mine, you can not only concatenate strings, but extract parts of strings via Key extends `on${infer K}` ? K : never or even Key extends `${Prefix}${infer Rest}` ? Rest : never. Type providers do not offer this functionality, and there's no clear way how it should if such functionality were to be added.
  • Mine is considerably simpler at the conceptual level: I'm just suggesting we add string concatenation types and, for the RHS of conditional types, the ability to extract its inverse. I also propose that it integrate with string itself to take the place of a regexp /.*/. It requires no API changes, and aside from the two theoretically complex parts that are mostly decoupled from the rest of the code base, calculating whether a template literal type is assignable to another and extracting a slice from a string, is similar, if not simpler, to implement.

BTW, my proposal could still type that PhoneNumber example, too. It's a bit more verbose, but I'm trying to model data that's already in TS land, not data that exists elsewhere (what F#'s type providers are most useful for). (It's worth noting this would technically expand to the full list of possible phone numbers here.)

type D = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
type PhoneNumber = `${D}${D}${D}-${D}${D}${D}-${D}${D}${D}${D}`;

RegexProvider and RegexProvider would be treated as two distinct types

Type providers could require the implementation of some equals or compare method, so that the type provider author of a regex type provider could define that both cases above are equivalent types. The type provider author could implement structural or nominal typing as they please.

Perhaps it would be possible to implement your string literal type as a type provider as well. I don't think the syntax could be the same, but you could get close with a type provider which takes in a variable number of arguments.

type D = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
type PhoneNumber = StringTemplateMatcherProvider<D, D, D, "-", D, D, D, "-", D, D, D, D>;

@AlexLeung But is the type "123-456-7890" assignable to your type? (If so, that'll complicate implementation and slow down the checker a lot.)

Semi-related to the discussion at hand, what if the type isn't of a fixed length (like a phone number)? One situation where I would've liked to use this recently is for storing a room name, of the format thread_{number}.

The regex to match such a value is thread_[1-9]\d*. With what is being proposed, it doesn't seem feasible (or even possible) to match such a format. The numerical part of the value could be _any_ length greater than zero in this situation.

@jhpratt I revised my proposal to accommodate that, in the form of starof ("0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9")/^\d*$/, since it only required a small change to it. It optimizes the same way string optimizes as /^[\u0000-\uFFFF]*$/, so I decided to go ahead and generalize that.

I don't want to extend starof further than that, like accepting arbitrary non-recursive unions, due to computational complexity concerns: verifying if two arbitrary regular expressions* are equivalent can be done in polynomial space or polynomial time (convert both to minimal DFA and compare - the usual way, but very slow in practice), but both ways are very slow in practice and AFAICT you can't have it both ways. Add support for squaring (like a{2}), and it's basically infeasible (exponential complexity). This is only for equivalence, and checking if a regexp matches a subset of the strings another regexp matches, required for checking assignability, is obviously going to be even more complicated.

* Regular expressions in the math sense: I'm only including single characters, (), (ab), (a|b), and (a*), where a and b are (potentially different) each members of this list.

This is probably a dumb question, but... why isn't it fairly easy, if adequately limited, to support a validation function (either lambda or named)?

For example, suppose we use ":" to indicate that the next element is a validator (substitute whatever you want for ":" if you have an opinion on this):

type email = string : (s) => { return !!s.match(...) }
type phone_number = string : (n) => { return !!String(n).match(...) }
type excel_worksheet_name = string : (s) => { return (s != "History") && s.length <= 31 && ... }

As an initial start, typescript could only accept validation functions that:

  • have a single argument, which is required/assumed to be of the "base" type
  • only reference variables that are defined in the validator function
  • return a value (which will be coerced to bool in the validation process)

The above conditions seem easy for the typescript compiler to verify, and once those conditions are assumed, much of the implementation complexity would go away.

In addition, if necessary to restrict initial scope to a manageable size:

  • validation functions can only be added to a subset of native types (string, number)

I don't think this last restriction would be all that necessary, but if there is any question as to whether it would be, I also don't think it would be worth spending much time debating it, because a solution with the above limitation would still solve a huge range of real-world use cases. In addition, I see little downside of the above limitations because relaxing them later would be a simple and natural extension that would require no change in the basic syntax and would merely expand the breadth of language support by the compiler.

@mewalig That would mean that something that looks like a runtime function would actually not execute on runtime, but on compile time (and every time you want to check asignability). These functions couldn't access anything from the runtime (variables, functions) which would feel pretty awkward.

Plus you generally don't want the compiler to run anything you throw at it, especially badly optimized functions or outright malicious while(true){}. If you want meta-programming, you have to design it smartly. Just randomly allowing runtime code to run at compile time would be the "PHP way" to do it.

Finally, the syntax you propose switches the usual pattern

let runtime: types = runtime;

(ie. types after colon) inside out, effectively being

type types = types: runtime;

which is horrible. So thank you for your proposal, but it's definitely a bad idea.

These functions couldn't access anything from the runtime (variables, functions) which would feel pretty awkward.

Of course they could, if the compiler has an ECMAScript runtime available to it (tsc does, BTW!). You obviously have an ambiguity issue with the compile-time semantics of e.g. fetch() vs. runtime semantics, but that's what iteration is about.

Just randomly allowing runtime code to run at compile time would be the "PHP way" to do it.

It's pretty similar to C++ constexpr functions, which are fine. The solution there is to say that constexpr can only use constexpr, but everything can use constexpr. Then you could have constexpr-equivalent versions of the filesystem for the compile-time filesystem which could be quite powerful.

The syntax also looks roughly fine to me: the LHS is a type, of course the RHS is a type of some sort too. My issue is more about how you would compose types past the "base" type, but that's all solvable too.

So thank you for your proposal, but it's definitely a bad idea.

It may end up being a bad idea, but for now I'm just seeing a very underspecified idea that will likely require straying too far from the goals of typescript. It doesn't mean that there might not be a good idea that is similar to it!

The discussion about this feature seems to stop for now (PR is closed and according to Design notes team _don't want to commit to this until we have nominal types and generalized index signatures, and we should know what those look like._).

Anyway, I want to propose another hypothetical extension to current PR that would support regex pattern extraction (@isiahmeadows presented his own proposal, but to be honest I cannot wrap my head around it now...).

I really like current PR and would base my proposal on that. I would like to propose the syntax based on generic type arguments inference that we have for functions (and conditional types with infer keyword). Simply because people already have some intuition that in generic function you can "extract" types from passed literal objects.

For example we have this type.

type Prop1 = /(\w)\.(\w)/

and we can use this type to test literal types

const goodLiteral = "foo.bar";
const badLiteral = "foo";
const regextTest: Prop1 = goodLiteral; //no error
const regextTest: Prop1 = badLiteral; //compiler error

function funProp1(prop: Prop1) { } 

funProp1(goodLiteral); //no error
funProp1(badLiteral); //error

However, when we use Regex type in function parameter we can use angle brackets syntax to mean that we want to infer matched strings. For example

type Prop1 = /(\w)\.(\w)/
const Prop1 = /(\w)\.(\w)/

const goodLiteral = "foo.bar";
const badLiteral = "foo";

function funProp1<M1 extends string, M2 extends string>(prop: Prop1<M1, M2>) : [M1, M2] 
{
    const m = prop.match(Prop1);
    return [m[1], m[2]];
} 

const res1 = funProp1(goodLiteral); //no error. Function signature inferred to be funProp<"foo", "bar">(prop: Prop1<"foo", "bar">) : ["foo", "bar"]
const res2 = funProp1(badLiteral); //compiler error

notice that inferred type of res1 is ["foo", "bar"]

Is it any useful?

  1. Ember.js/lodash get function

You could implement type-safe "string path" getter so this code would work:

const deep = get(objNested, "nested.very.deep")

But probably it would require to solve this if we want to avoid many overloads for fixed maximum number of possible get's "depth".

  1. Use extracted parameters in mapped types.

For example if we would be able to do something like this https://github.com/Microsoft/TypeScript/issues/12754. Then we could have possibility to reverse function (strip some prefix/suffix from all properties of given type). This one would probably need to to introduce some more generalized form of mapped typed syntax to chose new key for property (for example syntax like { [ StripAsyncSuffix<P> for P in K ] : T[P] }, someone already proposed something like that)

Probably there would be a other use cases too. But I guess most would fit in these two types (1. figuring out proper type based on provided string literal, 2. transforming property names of input type to new property names of new defined type)

This is something we could do with.

I am currently building custom lint rules in order to be able to validate urls - though, this would be much easier if we could define the optional params - which requires a regex in order to be able to validate our ids

In general, this would provide us with much more power to assert the validity of props across our code base

Is there any movement on the type providers, template string literals, or other suggestions? This would be such a great tool.

My workaround for this currently is to use a marker interface like this.

interface TickerSymbol extends String {}

The only problem is that when I want to use it as a index key, I have to cast it to string.

interface TickerSymbol extends String {}
var symbol: TickerSymbol = 'MSFT';
// declare var tickers: {[symbol: TickerSymbol]: any}; // Error: index key must be string or number
declare var tickers: {[symbol: string]: any};
// tickers[symbol]; // Type 'TickerSymbol' cannot be used as an index type
tickers[symbol as string]; // OK

However, JavaScript seems to be fine with index type of String (with capital S).

var obj = { one: 1 }
var key = new String('one');
obj[key]; // TypeScript Error: Type 'String' cannot be used as an index type.
// but JS gives expected output:
// 1

@DanielRosenwasser I have a proposal here, and a separate proposal was created in late 2016, so could the labels for this be updated?

We've reviewed the above proposals and have some questions and comments.

Problematic Aspects of Proposals so far

Types Creating Emit

We're committed to keeping the type system fully-erased, so proposals that require type aliases to produce emitted code are out of scope. I'll highlight some examples in this thread where this has happened perhaps in a way that isn't obvious:

https://github.com/microsoft/TypeScript/issues/6579#issuecomment-220180091 - creates a function and a type at the same tim

type Integer(n:number) => String(n).macth(/^[0-9]+$/)

https://github.com/microsoft/TypeScript/issues/6579#issuecomment-261519733 - also does this

type CssColor = /^#([0-9a-f]{3}|[0-9a-f]{6})$/i;
// ... later
setFontColorFromString(color: string) {
    fontColor = color;// compile time error
    if (CssColor.test(color)) {
    //  ^^^^^^^^ no value declaration of 'CssColor' !
        fontColor = color;// correct
    }
}

I'll reiterate: this is a non-starter. Types in TypeScript are composable and emitting JS from types is not possible in this world. The longest proposal to date has extensive emit-from-types; this isn't workable. For example, this would require extensive type-directed emit:

type Matcher<T extends number | boolean> = T extends number ? /\d+/ : /true|false/;
function fn<T extends number | boolean(arg: T, s: Matcher<T>) {
  type R = Matcher<T>
  if (R.test(arg)) {
      // ...
  }
}
fn(10, "10");
fn(false, "false");

Bans on Intersections

Actually common types and regex-validated types are really different, so we need rules how correclty handle their unions and intersections.

type Regex_1 = / ... /;
type Regex_2 = / ... /;
type NonRegex = { ... };
type test_4 = Regex_1 & NonRegex;// compile time error

TypeScript can't error on instantiations of intersections, so this wouldn't be part of any final design.

Ergonomics

Overall our most salient takeaway is that we want something where you're not writing the same RegExp twice (once in value space, once in type space).

Given the above concerns about type emit, the most realistic solution is that you would write the expression in value space:

// Probably put this in lib.d.ts
type PatternOf<T extends RegExp> = T extends { test(s: unknown): s is infer P } ? P : never;

const ZipCode = /^\d\d\d\d\d$/;
function map(z: PatternOf<typeof ZipCode>) {
}

map('98052'); // OK
map('Redmond'); // Error

You could still write the RegExp in type space, of course, but there'd be no runtime validation available and any nonliteral use would require a re-testing or assertion:

function map(z: /^\d\d\d\d\d$/) { }
map('98052'); // OK
map('Redmond'); // Error

function fn(s: string) {
    map(s); // Error
    // typo
    if (/^\d\d\d\d$/.test(s)) {
        // Error, /^\d\d\d\d$/ is not assignable to /^\d\d\d\d\d$/
        map(s);
    }

    if (/^\d\d\d\d\d$/.test(s)) {
        // OK
        map(s);
    }
}

Collection and Clarification of Use Cases

For a new kind of type, we'd ideally like to see several examples where:

  • The problem being solved has no better alternative (including plausible alternatives which aren't yet in the language)
  • The problem occurs with meaningful frequency in real codebases
  • The proposed solution solves that problem well

Compile-Time Validation of Literals

This thread implies a wide variety of use cases; concrete examples have been more rare. Troublingly, many of these examples don't seem to be complete - they use a RegExp that would reject valid inputs.

  • Font color - AFAIK anything that accepts hex colors also accepts e.g. "white" or "skyblue". This also incorrectly rejects rgb(255, 0, 0) syntax.
  • SSN, Zip, etc - OK, but why are there literal SSNs or Zip Codes in your code? Is this actually a need for nominal types? What happens if you have a subclass of strings that can't be accurately described by a RegExp? See "Competing proposals"

    • Integer - incorrectly rejects "3e5"

    • Email - This is usually considered a bad idea. Again though, there are email address string literals in your code?

    • CSS Border specs - I could believe that a standalone library could provide an accurate RegEx to describe the DSL it itself supports

    • Writing tests - this is where hardcoded inputs make some sense, though this is almost a counterpoint because your test code should probably be providing lots of invalid inputs

    • Date formats - how/why? Date has a constructor for this; if the input comes from outside the runtime then you just want a nominal type

    • URI - you could imagine that fetch would specify host to not being with http(s?):

TODO: Please help us by identifying real library functions that could benefit from RegExp types, and the actual expression you'd use.

One concern is "precisionitis" - what happens when someone helpfully shows up to DefinitelyTyped and adds RegExp types to every function in a library, thus breaking every nonliteral invocation? Worse, the definition file authors will have to agree exactly with the consumers of it what the "right spelling" of a validation RegExp is.
It seems like this quickly puts us on the road to a Tower of Babel situation where every library has their own version of what qualifies as a URL, what qualifies as a host name, what qualifies as an email, etc, and anyone connecting two libraries has to insert type assertions or copy regexes around to satisfy the compiler.

Enforcement of Runtime Checks

There has been some discussion of checks where we want to ensure that a function's arguments have been validated by a prior regex, like fn in the earlier Ergonomics section. This seems straightforward and valuable, if the RegEx that needs testing against is well-known. That's a big "if", however -- in my recollection, I can't remember a single library that provides validation regexes. It may provide validation functions - but this implies that the feature to be provided is nominal or tagged types, not regex types.

Counter-evidence to this assessment is welcomed.

Property Keys / Regex String Indexers

Some libraries treat objects according to the property names. For example, in React we want to apply types to any prop whose name starts with aria-:

interface IntrinsicElements {
    // ....
    [attributeName: /aria-\w+/]: number | string | boolean;
}

This is effectively an orthogonal concept (we could add Regex types without adding Regex property keys, and vice versa).

TODO (me or anyone): Open a separate issue for this.

Competing Proposals

Nominal or Tagged types

Let's say we had nominal/tagged types of some sort:

type ZipCode = make_unique_type string;

You could then write a function

function asZipCode(s: string): ZipCode | undefined {
    return /^\d\d\d\d\d$/.test(s) ? (s as ZipCode) : undefined;
}

At this point, would you really even need RegExp types? Refer to "compile-time" checking section for more thoughts.

Conversely, let's say we had RegExp types and not nominal types. It becomes pretty tempting to start (ab)using them for non-validation scenarios:

type Password = /(IsPassword)?.*/;
type UnescapedString = /(Unescaped)?.*/;
declare function hash(p: Password): string;

const p: Password = "My security is g00d"; // OK
const e: UnescapedString = "<div>''</div>"; // OK
hash(p); // OK
hash(e); // Error
hash("correct horse battery staple"); // OK

A common thing in the thread is that these regexes would help validate test code, because even though in production scenarios the code would be running against runtime-provided strings rather than hardcoded literals, you'd still want some validation that your test strings were "correct". This would seem to be an argument for nominal/tagged/branded strings instead, though, since you'd be writing the validation function either way, and the benefit of tests is that you know they run exhaustively (thus any errors in test inputs would be flagged early in the development cycle).

Non-Issues

We discussed the following aspects and consider them to not be blockers

Host Capabilities

Newer runtimes support more RegExp syntax than older runtimes. Depending on where the TypeScript compiler runs, certain code might be valid or invalid according to the runtime's capabilities of parsing newer RegExp features. In practice, most of the new RegExp features are fairly esoteric or relate to group matching, which don't seem to align with most of the use cases here.

Performance

RegExes can do an unbounded amount of work and matching against a large string can do an arbitrarily large amount of work. Users can already DOS themselves through other means, and are unlikely to write a maliciously inefficient RegExp.

Subtyping (/\d*/ -> /.*/ ?), Union, Intersection, and Uninhabitability

In theory /\d+/ is a knowable subtype of /.+/. Supposedly algorithms exist to determine if one RegExp matches a pure subset of another one (under certain constraints), but obviously would require parsing the expression. In practice we're 100% OK with RegExpes not forming implicit subtype relationships based on what they match; this is probably even preferable.

Union and Intersection operations would work "out of the box" as long as the assignability relationships were defined correctly.

In TypeScript, when two primitive types "collide" in an intersection, they reduce to never. When two RegExpes are intersected, we'd just keep that as /a/ & /b/ rather than try to produce a new RegExp matching the intersection of the two expressions. There wouldn't be any reduction to never we'd need an algorithm to prove that no string could satisfy both sides (this is a parallel problem to the one described earlier re: subtyping).

Next Steps

To summarize, the next steps are:

  • File a separate issue for Regex-named property keys AKA regex string indexers
  • Get concrete and plausible use cases for compile-time validation of string literals

    • Example: Identify functions in DefinitelyTyped or other libraries that would highly benefit from this

  • Understand if nominal/tagged/branded types are a more flexible and broadly-applicable solution for non-literal validation
  • Identify libraries that are providing validation RegExes already

Use case: Hyperscript (https://github.com/hyperhype/hyperscript) like functions
A hyperscript function usually is called like h('div#some-id')
A regex-ish pattern matcher would allow to determine the return type of h which would be HTMLDivElement in the example case.

If the type system would be able to add string literals, then basically any CSS property could be type-safe

declare let width: number;
declare let element: HTMLElement;

element.style.height = `${width}px`;
// ...or
element.style.height = `${width}%`;

CSS selectors could be validated too (element.class#id - valid, div#.name - invalid)

If capturing groups would work (somehow) then Lodash's get method could be type-safe

var object = { 'a': [{ 'b': { 'c': 3 } }] };

_.get(object, 'a[0].b.c');

This could be a thing too:

interface IOnEvents {
  [key: PatternOf</on[a-z]+/>]: (event: Event) => void;
}

interface IObservablesEndsOn$ {
  [key: PatternOf</\$$/>]: Observable<any>;
}

Use case: Hyperscript (hyperhype/hyperscript) like functions

What would that regex look like, or what validation would it provide? Is this for regex-based function overloading?

FWIW The library accepts namespaced tag names and also functions on arbitrary tag names

> require("hyperscript")("qjz").outerHTML
'<qjz></qjz>'

It also accepts an unbounded mixing of class and id values

> require("hyperscript")("baz.foo#bar.qua").outerHTML
'<baz class="foo qua" id="bar"></baz>'

CSS selectors could be validated too

CSS selectors cannot be validated by a regular expression

What would that regex look like, or what validation would it provide? Is this for regex-based function overloading?

Not the OP, but I presume, yes, something like the HTMLDocument#createElement() overloads, e.g.:

// ...
export declare function h(query: /^canvas([\.#]\w+)*$/): HTMLCanvasElement;
// ...
export declare function h(query: /^div([\.#]\w+)*$/): HTMLDivElement;
// ...

I'm sure the REs are incomplete. Note that this is a special case of validating CSS selectors, which are used in many contexts in a regular way. For example, it's perfectly OK for HTMLDocument.querySelector() to return HTMLElement as a fallback if you're using a complex selector.

I am curious if there are non-overloading examples that are both feasible and useful, though.

TODO: Please help us by identifying real library functions that could benefit from RegExp types, and the actual expression you'd use.

My use case is the one I explained in this comment in the CCXT library where I have strings that represent TickerSymbols. I don't really care if they are checked for a regex pattern, but I want them to be treated as sub-types of string so I get more strict assignments, parameter type checking, etc. I found it to be very useful when I'm doing functional programming, with that I can easily track TickerSymbols, Currencies, Assets, etc at compile-time where at run-time they are just normal strings.

@omidkrad This sounds like you need nominal types, not regex-validated types.

@m93a In my case I will be fine with nominal types, but for the same use case you could use regex-validated types for stricter type checking and self-documenting the string types.

CSS selectors could be validated too

CSS selectors cannot be validated by a regular expression

Well, if the regexp would enable us to stitch them together we could copy CSS regexes..., right?

The (draft) CSS Typed Object Model

https://drafts.css-houdini.org/css-typed-om/

https://developers.google.com/web/updates/2018/03/cssom

Potentially alleviates the desire to use the stringly-typed CSS model.

el.attributeStyleMap.set('padding', CSS.px(42));
const padding = el.attributeStyleMap.get('padding');
console.log(padding.value, padding.unit); // 42, 'px'

@RyanCavanaugh For Mithril in particular, the tag name is extracted via the capture group in ^([^#\.\[\]]+) (defaults to "div"), but matching ^(${htmlTagNames.join("|")}) would be sufficient for our purposes. And so using my proposal, this would be sufficient for my purposes:

type SelectorAttrs = "" | `#${string}` | `.${string}`;

type GetTagName<T extends string> =
    T extends SelectorAttrs ? "div" :
    T extends `${keyof HTMLElementTagNameMap & (infer Tag)}${SelectorAttrs}` ? T :
    string;

As for events and attributes, we could switch to this once negated types land:

type EventsForElement<T extends Element> =
    T extends {addEventListener(name: infer N, ...args: any[]): any} ? N : never;

type MithrilEvent<E extends string> =
    (E extends EventsForElement<T> ? HTMLElementEventMap[E] : Event) &
    {redraw?: boolean};

type Attributes<T extends Element> =
    LifecycleAttrs<T> &
    {[K in `on${string}` & not LifecycleAttrs<T>](
        ev: K extends `on${infer E}` ? MithrilEvent<E> : never
    ): void | boolean} &
    {[K in keyof T & not `on${string}`]: T[K]} &
    {[K in string & not keyof T & not `on${string}`]: string};

BTW, this seamless integration and avoidance of complexity is why I still prefer my proposal over literal regexps.


I know of no way to do this with pure regexp types, though. I do want to point that out.

TODO: Please help us by identifying real library functions that could benefit from RegExp types, and the actual expression you'd use.

bent has a different return type based on what is given as a string that describes the expected response type, e.g.

bent('json')('https://google.com') // => Promise<JSON>
bent('buffer')('https://google.com') // => Promise<Buffer | ArrayBuffer>
bent('string')('https://google.com') // => Promise<String>

It also accepts some other arguments, such as method and url as strings, but these can appear in any position, so if we try to use unions to describe all the return type ('json' | 'buffer' | 'string'), this would instead dumb down to just string when combined with the url and method types in the union, meaning we can't automatically infer the return type based on the type given in first call.

@Ovyerus how would regex types help you there? What would you expect to write? You can model something similar to bent's behavior with overloads or conditional types.

type BentResponse<Encoding> = Promise<
    Encoding extends "json" ? MyJsonType :
    Encoding extends "buffer" ? Buffer | ArrayBuffer :
    Encoding extends "string" ? string :
    Response
>;

declare function bent<T extends string>(urlOrEncoding: T): (url: string) => BentResponse<T>;

http://www.typescriptlang.org/play/index.html#code/C4TwDgpgBAQhB2wBKEDOYD29UQDwFF4BjDAEwEt4BzAPigF4oAFAJwwFtydcBYAKCiCohEhWpQIAD2AJSqKACIAVqiwKoAfigBZEAClV8ACrhoALn5DhxMpSoTps+QoBGAVwBmHiC3VaYnt4sUAA+UACCLCwAhiABXj5QFgJCIrbiUjLwcoqowCx2flB5BeLJVijoWDj8NADc-PykEEQANtEs0B5uxMDkWFAuCMC4Rg5ZOSV2NAAUbiytAPIsaWJUZlBGAJQbcwsbU9RbDHRwiJWY2HhG9Y18lDIsHtFE0PFBUADeUAD67gksDbReAgKAAX34Dx8z1eOn0hhMkC+vxUWCBIPBdxm0VQIGIUBmx3odE+liErQgwCgkg2ugMWER0EY0QA7tFyFShogZspDAotjyABbAYBgVBmAD0Eqk0XYYApADoSOx+Q0+GCBVsgA

Oh I was unclear sorry, I believe my issue was more along the lines of matching http(s): at the start of a string to detect base URL.

Bent's signature is more along the lines of

type HttpMethods = 'GET' | 'PATCH' | ...
type StatusCode = number;
type BaseUrl = string; // This is where I would ideally need to see if a string matches http(s):
type Headers = { [x: string]: any; };

type Options = HttpMethods | StatusCode | BaseUrl | Headers;

function bent(...args: Options[]): RequestFunction<RawResponse>
function bent(...args: (Options | 'json')[]): RequestFunction<JSON>
// and so on

However having BaseUrl as a string absorbs the HttpMethods and return type unions, which ends up as just string. Having it just as a string also "improperly" matches how bent works, as it does check for the presence of ^http: or ^https: in order to determine what it should use as the base url.

If we had regex types, I could define BaseUrl as type BaseUrl = /^https?:/, and this ideally would properly verify strings that aren't a HTTP method or response encoding, as well as not absorbing them into the string type.

Exactly, I am the same.

--
Prokop Simek

On 20 October 2019 at 03:23:30, Michael Mitchell ([email protected])
wrote:

Oh I was unclear sorry I believe my issue was more along the lines of
matching http(s): at the start of a string to detect base URL.

Bent's signature is more along the lines of

type HttpMethods = 'GET' | 'PATCH' | ...type StatusCode = number;type BaseUrl = string; // This is where I would ideally need to see if a string matches http(s):type Headers = { [x: string]: any; };
type Options = HttpMethods | StatusCode | BaseUrl | Headers;
function bent(...args: Options[]): RequestFunctionfunction bent(...args: (Options | 'json')[]): RequestFunction// and so on

However having BaseUrl as a string absorbs the HttpMethods and return
type unions, which ends up as just string. Having it just as a string
also "improperly" matches how bent works, as it does check for the presence
of ^http: or ^https: in order to determine what it should use as the base
url.

If we had regex types, I could define BaseUrl as type BaseUrl = /^https?:/,
and this ideally would properly verify strings that aren't a HTTP method or
response encoding, as well as not absorbing them into the string type.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/microsoft/TypeScript/issues/6579?email_source=notifications&email_token=ABJ3U4JNK3V5MV4DJH73ZU3QPOXJFA5CNFSM4BZLAVSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBYABKA#issuecomment-544211112,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABJ3U4PHBXO4766LK7P7UXDQPOXJFANCNFSM4BZLAVSA
.

The thought I had of a use case was to detect parameter types to a function.

Basically I have a well defined regex format of a string representing an identifier. I could use decorators, but an extended string type would let me use a type to represent the identifier passed to the function.

To reiterate, we need examples of JavaScript code you want to write in a typed way - otherwise we can only guess what you're trying to model (and whether there's already a way to model it).

@DanielRosenwasser Below's an example of code we would like to enforce typing about. http://www.typescriptlang.org/play/index.html#code/C4TwDgpgBAqjCSARKBeKBnYAnAlgOwHMBYAKFIGMAbAQ3XVnQiygG9SoOoxcA3a4aJn45yULBGoATAPZ5KIKAFtq+GIywAuWAmRoA5NQCcEAAwBGQwCMArAFoAZtYBMAdlsAWcvYActw2ZM7JzNJF28TCCcANgtLPXZOAHpErl5+QWBhUXEpWXklFTw1Ji04JFQoMycAZihkqAA5AHkAFSgAQQAZTqaAdQBRRASOeu4cPgEMTOARMQkZOQVlVXVSnQq9PGlgW2pbAFd9nEk9OpSAZQAJJphO5Ga2gCF+ju6+wah++BbL-oAlYZQciyTBYfbkYDSLAACkBnC4+0slFmxzWSAANHDOGBEcjRJYsNQ8JItKD8ARMSR4fCcUjZuocNRKFo8PtFJYmJTqdjcbNyDkBJJHiA0boGEwAHTLIrqACUrFICQAvqQVSQgA

@yannickglt it seems like you want a nominal type, not a RegExp type? You're not expecting callers to show up with random site-validated invocations like this:

// OK
someFunc('a9e019b5-f527-4cf8-9105-21d780e2619b');
// Also OK, but probably really bad
someFunc('a9e019b5-f527-4cf8-9106-21d780e2619b');
// Error
someFunc('bfe91246-8371-b3fa-3m83-82032713adef');

IOW the fact that you are able to describe a UUID with a regular expression is an artifact of the format of the string itself, whereas what you are trying to express is that UUIDs are a special kind of type whose backing format happens to be a string.

So the combination of 3.7's Assertion Functions and the nominal Feature can do this (?)

nominal UUID = string

function someFunc(uuid: any): asserts uuid is UUID {
  if (!UUID_REGEX.test(uuid)) {
    throw new AssertionError("Not UUID!")
  }
}

class User {
  private static readonly mainUser: UUID = someFunc('a9e019b5-f527-4cf8-9105-21d780e2619b')
  // private static readonly mainUser: UUID = someFunc(123) // assertion fails
  // private static readonly mainUser: UUID = someFunc('not-a-uuid') // assertion fails
  constructor(
    public id: UUID,
    public brand: string,
    public serial: number,
    public createdBy: UUID = User.mainUser) {

  }
}

Will this fail also?

new User('invalid-uuid', 'brand', 1) // should fail
new User('invalid-uuid' as UUID, 'brand', 1) // 🤔 

After thinking for a while, I see a problem with my proposed solution 🤔
The asserts only trigger an error at runtime -> 👎
The Regex-Validation could trigger a compile-time error -> 👍
Otherwise, this proposal makes no sense

Edit:
Another issue: someFunc(uuid: any): asserts uuid is UUID doesn't return an UUID, it throws or returns is UUID -> true.
So I cant use this function to assign an UUID in this way to mainUser

@RyanCavanaugh We want these to be correctly typed for Mithril:

// <div id="hello"></div>
m("div#hello", {
    oncreate(vnode) { const dom: HTMLDivElement = vnode.dom },
})

// <section class="container"></section>
m("section.container", {
    oncreate(vnode) { const dom: HTMLElement = vnode.dom },
})

// <input type="text" placeholder="Name">
m("input[type=text][placeholder=Name]", {
    oncreate(vnode) { const dom: HTMLInputElement = vnode.dom },
})

// <a id="exit" class="external" href="https://example.com">Leave</a>
m("a#exit.external[href='https://example.com']", {
    oncreate(vnode) { const dom: HTMLAnchorElement = vnode.dom },
}, "Leave")

// <div class="box box-bordered"></div>
m(".box.box-bordered", {
    oncreate(vnode) { const dom: HTMLDivElement = vnode.dom },
})

// <details></details> with `.open = true`
m("details[open]", {
    oncreate(vnode) { const dom: HTMLDetailsElement = vnode.dom },
})

// alias for `m.fragment(attrs, ...children)`
m("[", {
    oncreate(vnode) { const dom: HTMLElement | SVGElement = vnode.dom },
}, ...children)

We want to statically reject these:

// selector must be non-empty
m("")

// incomplete class
m("div.")

// incomplete ID
m("div#")

// incomplete attribute
m("div[attr=")

// not special and doesn't start /[a-z]/i
m("@foo")

Ideally, we'd also want to statically reject these, but it's not as high priority and we can survive without them:

// event handers must be functions
m("div[onclick='return false']")

// `select.selectedIndex` is a number
m("select[selectedIndex='not a number']")

// `input.form` is read-only
m("input[type=text][form='anything']")

// `input.spellcheck` is a boolean, this evaluates to a string
// (This is a common mistake, actually.)
m("input[type=text][spellcheck=false]")

// invalid tag name for non-custom element
m("sv")

This would require a much more complicated type definition, one where we'd need a custom type check failure message to help users figure out why it failed to type check.

Other hyperscript libraries and hyperscript-based frameworks like react-hyperscript have similar concerns, too.

Hope this helps!

@isiahmeadows better way for you to use some form of selector string builder, which will return branded string, with correct typings. Like:

m(mt.div({ attr1: 'val1' }))

@anion155 There's other ways of getting there, too, but this is about typing a library whose API was designed by its original author back in 2014. If I were designing its API now, I'd likely use m("div", {...attrs}, ...children) with none of the hyperscript sugar (easier to type, much simpler to process), but it's far too late now to do much about it.

I have A LOT to say. However, I'm impatient. So, I'll be releasing my thoughts a bit at a time.

https://github.com/microsoft/TypeScript/issues/6579#issuecomment-542405537

Regarding "precisionitis" (man, I love that word),
I don't think we should be worrying about it too much.

The type system is already turing complete.
This basically means we can be super-precise about a lot of things.
(Like, modeling all of SQL? Shameless plug =P)

But you don't see (too many) people going all-out, and using all the type operators in crazy ways that block libraries from being compatible with each other. I like to think that library authors tend to be level-headed enough... Right?

It's not often that I've wished for string-pattern types/regex-validated string types but they definitely would have helped increase the type safety of my code base.


Use Case

Off the top of my head, I can think of one recent example. (There are a bunch more but I'm a forgetful being)

When integrating with Stripe's API (a payment processing platform), they use ch_ for charge-related identifiers, re_ for refund-related identifiers, etc.

It would have been nice to encode them with PatternOf</^ch_.+/> and PatternOf</^re_.+/>.

This way, when making typos like,

charge.insertOne({ stripeChargeId : someObj.refundId });

I would get an error,

Cannot assign `PatternOf</^re_.+/>` to `PatternOf</^ch_.+/>`

As much as I love nominal/tagged types, they are far more unergonomic and error-prone.
I always see nominal/tagged types as a last resort, because it means that there's something that the TS type system just cannot model.

Also, tagged types are great for phantom types.
Nominal types are basically never useful.
(Okay, I may be biased. They're useful only because of unique symbol But I like to think I'm not completely wrong.)

The "ValueObject" pattern for validation is even worse and I will not bother talking about it.


Comparison

Below, I will compare the following,

  • String-pattern types/regex-validated string types
  • Nominal types
  • Structural tag types

We can all agree that the "ValueObject" pattern is the worst solution, and not bother with it in the comparisons, right?


String-pattern types

const stripeChargeIdRegex = /^ch_.+/;
const stripeRefundIdRegex = /^re_.+/;

type StripeChargeId = PatternOf<typeof stripeChargeIdRegex>;
type StripeRefundId = PatternOf<typeof stripeRefundIdRegex>;

declare function takesStripeChargeId (stripeChargeId : StripeChargeId) : void;

declare const str : string;
takesStripeChargeId(str); //Error
if (stripeChargeIdRegex.test(str)) {
  takesStripeChargeId(str); //OK
}
if (stripeRefundIdRegex.test(str)) {
  takesStripeChargeId(str); //Error
}

declare const stripeChargeId : StripeChargeId;
declare const stripeRefundId : StripeRefundId;
takesStripeChargeId(stripeChargeId); //OK
takesStripeChargeId(stripeRefundId); //Error

takesStripeChargeId("ch_hello"); //OK
takesStripeChargeId("re_hello"); //Error

Look at that.

  • Perfect for string literals.
  • Not too bad for string non-literals.

Nominal types...

const stripeChargeIdRegex = /^ch_.+/;
const stripeRefundIdRegex = /^re_.+/;

type StripeChargeId = make_unique_type string;
type StripeRefundId = make_unique_type string;

function isStripeChargeId (str : string) : str is StripeChargeId {
  return stripeChargeIdRegex.test(str);
}
function isStripeRefundId (str : string) : str is StripeRefundId {
  return stripeRefundIdRegex.test(str);
}

declare function takesStripeChargeId (stripeChargeId : StripeChargeId) : void;

declare const str : string;
takesStripeChargeId(str); //Error
if (isStripeChargeId(str)) {
  takesStripeChargeId(str); //OK
}
if (isStripeRefundId(str)) {
  takesStripeChargeId(str); //Error
}

declare const stripeChargeId : StripeChargeId;
declare const stripeRefundId : StripeRefundId;
takesStripeChargeId(stripeChargeId); //OK
takesStripeChargeId(stripeRefundId); //Error

takesStripeChargeId("ch_hello"); //Error? Ughhhh
takesStripeChargeId("re_hello"); //Error

takesStripeChargeId("ch_hello" as StripeChargeId); //OK, BUT UNSAFE
takesStripeChargeId("re_hello" as StripeChargeId); //OK, BUT WAIT! I MESSED UP

const iKnowThisIsValid = "ch_hello";
if (isStripeChargeId(iKnowThisIsValid)) {
  takesStripeChargeId(iKnowThisIsValid); //OK
} else {
  throw new Error(`Wat? This should be valid`);
}

function assertsStripeChargeId (str : string) : asserts str is StripeChargeId {
  if (!isStripeChargeId(str)) {
    throw new Error(`Expected StripeChargeId`);
  }
}
assertsStripeChargeId(iKnowThisIsValid);
takesStripeChargeId(iKnowThisIsValid); //OK

function makeStripeChargeIdOrError (str : string) : StripeChargeId {
  assertsStripeChargeId(str);
  return str;
}
takesStripeChargeId(makeStripeChargeIdOrError("ch_hello")); //OK
takesStripeChargeId(makeStripeChargeIdOrError("re_hello")); //OK, compiles, throws during run-time... Not good

Look at that.

  • TERRIBLE for string literals.
  • After overcoming the string literal hurdle, it's not too bad... Right?

But the main use-case for this proposal is string literals.
So, this is a terrible alternative.


Structural tag types...

Structural tag types are not much different from nominal types...

const stripeChargeIdRegex = /^ch_.+/;
const stripeRefundIdRegex = /^re_.+/;

type StripeChargeId = string & tag { stripeChargeId : void };
type StripeRefundId = string & tag { stripeRefundId : void };

function isStripeChargeId (str : string) : str is StripeChargeId {
  return stripeChargeIdRegex.test(str);
}
function isStripeRefundId (str : string) : str is StripeRefundId {
  return stripeRefundIdRegex.test(str);
}

declare function takesStripeChargeId (stripeChargeId : StripeChargeId) : void;

declare const str : string;
takesStripeChargeId(str); //Error
if (isStripeChargeId(str)) {
  takesStripeChargeId(str); //OK
}
if (isStripeRefundId(str)) {
  takesStripeChargeId(str); //Error
}

declare const stripeChargeId : StripeChargeId;
declare const stripeRefundId : StripeRefundId;
takesStripeChargeId(stripeChargeId); //OK
takesStripeChargeId(stripeRefundId); //Error

takesStripeChargeId("ch_hello"); //Error? Ughhhh
takesStripeChargeId("re_hello"); //Error

takesStripeChargeId("ch_hello" as StripeChargeId); //OK, BUT UNSAFE
takesStripeChargeId("re_hello" as StripeChargeId); //OK, BUT WAIT! I MESSED UP

const iKnowThisIsValid = "ch_hello";
if (isStripeChargeId(iKnowThisIsValid)) {
  takesStripeChargeId(iKnowThisIsValid); //OK
} else {
  throw new Error(`Wat? This should be valid`);
}

function assertsStripeChargeId (str : string) : asserts str is StripeChargeId {
  if (!isStripeChargeId(str)) {
    throw new Error(`Expected StripeChargeId`);
  }
}
assertsStripeChargeId(iKnowThisIsValid);
takesStripeChargeId(iKnowThisIsValid); //OK

function makeStripeChargeIdOrError (str : string) : StripeChargeId {
  assertsStripeChargeId(str);
  return str;
}
takesStripeChargeId(makeStripeChargeIdOrError("ch_hello")); //OK
takesStripeChargeId(makeStripeChargeIdOrError("re_hello")); //OK, compiles, throws during run-time... Not good

Look at that.

  • TERRIBLE for string literals.
  • After overcoming the string literal hurdle, it's not too bad... Right?

But the main use-case for this proposal is string literals.
So, this is a terrible alternative.

Also, this structural tag type example is a literal (ha, pun) copy-paste of the nominal type example.

The only difference is in how the types StripeChargeId and StripeRefundId are declared.

Even though the code is basically the same, structural types are better than nominal types. (I'll clarify this in the next post, I swear).


Conclusion

This is just a conclusion for this comment! Not a conclusion to my overall thoughts!

String-pattern types/regex-validated string types are more ergonomic than nominal/structural tag types. Hopefully, my simple examples were not too contrived have demonstrated that, sufficiently.


Conclusion (Extra)

As much as possible, ways to take the subset of a primitive type should always be preferred over nominal/structural tag/value-object types.

Examples of taking the subset of primitive types,

  • string literals
  • number literals (exluding NaN, Infinity, -Infinity)
  • boolean literals
  • bigint literals
  • Even unique symbol is just taking a subset of symbol

Out of the above examples, only boolean is "finite enough". It only has two values.
Developers are satisfied with having true and false literals because there's not much else to ask for.


The number type is finite-ish but it has so many values, we might as well consider it infinite.
There are also holes in what literals we can specify.

This is why the range number type, and NaN, Infinity, -Infinity issues are so popular, and keep popping up. Being able to specify a small finite set of values, from an infinite set is not good enough.

Specifying a range is one of the most common/natural ideas to come to someone when they need to specify a large finite/infinite subset of an infinite set.


The bigint type is basically infinite, limited only by memory.

It also contributes to the popularity of the range number type issue.


The string type is basically infinite, limited only by memory.

And this is why this string-pattern type/regex-validated string type issue is so popular.

Specifying a regex is one of the most common/natural ideas to come to someone when they need to specify a large finite/infinite subset of an infinite set.


The symbol type... It's also infinite. And also unbounded, pretty much.

But the elements of the symbol type are all pretty much unrelated to each other, in almost every way. And, so, no one has made an issue to ask, "Can I have a way to specify a large finite/infinite subset of symbol?".

To most people, that question doesn't even make sense. There isn't a meaningful way to do this (right?)


However, just being able to declare subsets of primitives isn't very useful. We also need,

  • Literals of the right type must be assignable without further work

Thankfully, TS is sane enough to allow this.

Imagine being unable to pass false to (arg : false) => void!

  • Builtin ways of narrowing

    At the moment, for these literals, we have == & === as builtin ways of narrowing.

    Imagine needing to write a new type guard for each literal!

The problem with nominal/structural tag/value-object types is that they basically fail to fulfill the above two criteria. They turn primitive types into clunky types that aren't quite object types, but must be handled like object types, anyway.

Ergonomics

Okay, here's more elaboration on string-pattern vs nominal vs structural tag types.

These arguments apply to https://github.com/microsoft/TypeScript/issues/15480 as well.


Cross-Library Compatibility

Nominal types are the worst at cross-library compatibility.
It's like using unique symbol in two libraries and trying to get them to interoperate.
It simply cannot be done.
You need to use a boilerplate type guard, or the trust-me-operator (as).

You'll need more boilerplate for an assertion guard, too.

If the type does not require cross-library compatibility, then using nominal types is fine...
Even if very unergonomic (see above example).


For structural types, if library A has,

//Lowercase 'S'
type StripeChargeId = string & tag { stripeChargeId : void };

And library B has,

//Uppercase 'S'
type StripeChargeId = string & tag { StripeChargeId : void };

//Or
type StripeChargeId = string & tag { isStripeChargeId : true };

//Or
type StripeChargeId = string & tag { stripe_charge_id : void };

Then you'll need a boilerplate type guard, or the trust-me-operator (as).

You'll need more boilerplate for an assertion guard, too.

If the type does not require cross-library compatibility, then using structural types is fine...
Even if very unergonomic (see above example).


For string-pattern types, if library A has,

type stripeChargeIdRegex = /^ch_.+/;
type StripeChargeId = PatternOf<typeof stripeChargeIdRegex>;

And library B has,

//Extra dollar sign at the end
type stripeChargeIdRegex = /^ch_.+$/;
type StripeChargeId = PatternOf<typeof stripeChargeIdRegex>;

//Or,
type stripeChargeIdRegex =/^ch_[a-zA-Z0-9]$/;
type StripeChargeId = PatternOf<typeof stripeChargeIdRegex>;

//Or,
type stripeChargeIdRegex =/^ch_[A-Za-z0-9]$/;
type StripeChargeId = PatternOf<typeof stripeChargeIdRegex>;

Assume both libraries always produce strings for StripeChargeId that will satisfy the requirements of both libraries. Library A is just "lazier" with its validation. And library B is "stricter" with its validation.

Then, it's kind of annoying. But not too bad.
Because you can just use libraryB.stripeChargeIdRegex.test(libraryA_stripeChargeId) as the typeguard. No need to use the trust-me-operator (as).

You'll still need boilerplate for assertion guards, though.

If the type does not require cross-library compatibility, then using string-pattern types is perfect, and also very ergonomic.


If you need cross-library compatibility, string-pattern types are still better than structural tag types! Hear me out.

If the domain being modeled is well-understood, then it is very likely that multiple, isolated library authors will end up writing the same regex. With structural tag types, they could all just write whatever properties and types they want in the tags.

If there's a standard specifying string formats for whatever is being modeled, then it is basically guaranteed that all library authors will write the same regex! If they write a different regex, they're not really following the standard. Do you want to use their library? With structural tag types, they could still all just write whatever. (Unless someone starts a structural tag type standard that everyone will care about? Lol)


Cross-Version Compatibility

As usual, nominal types are the worst at cross-version compatibility.
Oh, you bumped your library a patch, or minor version?
The type decalaration is still the same?
The code is still the same?
Nope. They're different types.

image


Structural tag types are still assignable, across versions (even major versions), as long as the tag type is structurally the same.


String-pattern types are still assignable, across versions (even major versions), as long as the regex is the same.

Or we could just run a PSPACE-complete algorithm to determine if the regexes are the same? We can also determine which subclasses of regexes are the most common and run optimized equivalence algorithms for those... But that sounds like a lot of effort.

Regex subtype checks would be cool to have, and would definitely make using string-pattern types more ergonomic. Just like how range subtype checks would benefit the number range type proposal.

[Edit]
In this comment,
https://github.com/microsoft/TypeScript/issues/6579#issuecomment-243338433

Someone linked to,
https://bora.uib.no/handle/1956/3956

Titled, "The Inclusion Problem for Regular Expressions"
[/Edit]


Boilerplate

TODO (But we can see that string-pattern types have the least amount of boilerplate)

Literal Invocation

TODO (But we can see that string-pattern types support literal invocation the best)

Non-Literal Invocation

TODO (But we can see that string-pattern types support non-literal invocation the best)

More regarding https://github.com/microsoft/TypeScript/issues/6579#issuecomment-542405537

TypeScript can't error on instantiations of intersections, so this wouldn't be part of any final design.

I don't know why people wanted to ban intersections, but you're absolutely right that banning it doesn't make sense.


thus breaking every nonliteral invocation?

Well, not every non-literal invocation.

declare function foo (arg : PatternOf</a+/>) : void;
function bar (arg : PatternOf</a+/>) : void {
  //non-literal and does not break.
  foo(arg);
}
bar("aa"); //OK
bar("bb"); //Error
bar("" as string); //Error, I know this is what you meant by non-literal invocation

function baz (arg : "car"|"bar"|"tar") : void {
  bar(arg); //OK
}

Breaking on a non-literal invocation, where it cannot prove that it matches the regex, isn't necessarily a bad thing. It's just a type-safety thing.

That's kind of like saying that string literals are bad because now non-literal invocations fail.
String-pattern types/regex-validated string types just let you define unions of an infinite number of string literals.


any nonliteral use would require a re-testing or assertion:

I don't see that as an issue at all.
It's the same with nominal/tagged types right now.
Or trying to pass a string to a function expecting string literals.
Or trying to pass a wider type to a narrower type.

In this particular case, you've shown that const ZipCode = /^\d\d\d\d\d$/; and ZipCode.test(s) can act as a type guard. This will certainly help with the ergonomics.


  • The problem being solved has no better alternative (including plausible alternatives which aren't yet in the language)

Well, hopefully I've shown that nominal/structural tag types are not the better alternative. They're actually pretty bad.

  • The problem occurs with meaningful frequency in real codebases

Uhh... Let me get back to you on that one...

  • The proposed solution solves that problem well

The proposed string-pattern type seems to be pretty good.


TODO: Please help us by identifying real library functions that could benefit from RegExp types, and the actual expression you'd use.

Your view is that nominal/tagged types are good enough for non-literal use.
So, any use case brought up that shows non-literal usage is not good enough, because nominal/tagged types cover it.

However, we've seen that, even for non-literal use,

  • Nominal/structural tag types suffer from cross-library/version compatibility issues
  • Amount of boilerplate for nominal/strucutral tag types is significantly more than boilerplate for string-pattern types

Also, it seems that the literal use cases brought up have been unsatisfactory to you, because they try and do silly things like email validation, or use regexes that aren't accurate enough.


Writing tests - this is where hardcoded inputs make some sense, though this is almost a counterpoint because your test code should probably be providing lots of invalid inputs

A good use case brought up was writing run-time tests. And you are right, that they should be throwing a lot of invalid inputs at it for run-time tests, too.

But that's no reason to not support string-pattern types. It might be the case that they want to test valid inputs in a certain file and accidentally give invalid input.

But, because they have to use a type guard or trust-me-operator (as) or value object, now they'll get a run-time error, instead of knowing that the test will fail ahead of time.

Using the trust-me-operator (as) for run-time tests should only be reserved for testing invalid inputs. When wanting to test valid inputs, it is more clear to not need hacks to assign literals to a nominal/structural tag type.

If they ever change the regex in future, it would be nice if their tests now fail to even run, because of assignability issues. If they just us as everywhere in their tests, they won't know until they run the tests.

And if the library author just uses as everywhere when dogfooding their own library... What of downstream consumers? Won't they also be tempted to use as everywhere and run into run-time problems when upgrading to a new version?

With string-pattern types, there's less reason to use as everywhere and both library author and downstream consumers will know of breaking changes more easily.

(Kind of long winded but I hope some of my points got through).


Also, I write a lot of compile-time tests (And I know the TS team does so, too).

It would be nice if I can test that a certain string literal will fail/pass a regex check in my compile-time tests. At the moment, I can't have compile-time tests for these things and need to use a run-time test, instead.

And if it fails/passes my compile-time tests, then I'll have confidence that downstream consumers can use those string literals (or similar ones) and expect them to behave the right way.


It seems like this quickly puts us on the road to a Tower of Babel situation...

This is even more true of using nominal/structural tag types, actually. As the above examples have shown, they do terribly for cross-library/version compatibility...

However, regexes/string-pattern types have a decent chance at not falling into that problem (hopefully, thanks to standardization, and sane library authors).


EDIT

A common thing in the thread is that these regexes would help validate test code, because even though in production scenarios the code would be running against runtime-provided strings rather than hardcoded literals, you'd still want some validation that your test strings were "correct". This would seem to be an argument for nominal/tagged/branded strings instead, though, since you'd be writing the validation function either way, and the benefit of tests is that you know they run exhaustively (thus any errors in test inputs would be flagged early in the development cycle).

Ah... I should have read everything first before writing this...

Anyway, I do have some examples with me, where string-pattern types are useful.


HTTP Route Declaration Library

With this libary, you can build HTTP route declaration objects. This declaration is used by both client and server.

/*snip*/
createTestCard : f.route()
    .append("/platform")
    .appendParam(s.platform.platformId, /\d+/)
    .append("/stripe")
    .append("/test-card")
/*snip*/

Thse are the constraints for .append(),

  • String literals only (Can't enforce this at the moment, but if you use non-literals, the route declaration builder becomes garbage)
  • Must start with leading forward slash (/)
  • Must not end with trailing forward slash (/)
  • Must not contain conlon character (:); it is reserved for parameters
  • Must not contain two, or more, forward slashes consecutively (//)

Right now, I only have run-time checks for these, that throw errors. I would like downstream consumers to have to follow these constraints without needing to read some Github README or JSDoc comment. Just write the path and see red squiggly lines.


Other stuff

I also have regexes for hexadecimal strings, alphanumeric strings.

I also have this,

const floatingPointRegex = /^([-+])?([0-9]*\.?[0-9]+)([eE]([-+])?([0-9]+))?$/;

I see this,

Integer - incorrectly rejects "3e5"

I also have this, which isn't an integer regex but uses the floatingPointRegex,

function parseFloatingPointString (str : string) {
    const m = floatingPointRegex.exec(str);
    if (m == undefined) {
        return undefined;
    }
    const rawCoefficientSign : string|undefined = m[1];
    const rawCoefficientValue : string = m[2];
    const rawExponentSign : string|undefined = m[4];
    const rawExponentValue : string|undefined = m[5];

    const decimalPlaceIndex = rawCoefficientValue.indexOf(".");
    const fractionalLength = (decimalPlaceIndex < 0) ?
        0 :
        rawCoefficientValue.length - decimalPlaceIndex - 1;

    const exponentValue = (rawExponentValue == undefined) ?
        0 :
        parseInt(rawExponentValue) * ((rawExponentSign === "-") ? -1 : 1);

    const normalizedFractionalLength = (fractionalLength - exponentValue);
    const isInteger = (normalizedFractionalLength <= 0) ?
        true :
        /^0+$/.test(rawCoefficientValue.substring(
            rawCoefficientValue.length-normalizedFractionalLength,
            rawCoefficientValue.length
        ));
    const isNeg = (rawCoefficientSign === "-");

    return {
        isInteger,
        isNeg,
    };
}

I also have this comment, though,

/**
    Just because a string is in integer format does not mean
    it is a finite number.

    ```ts
    const nines_80 = "99999999999999999999999999999999999999999999999999999999999999999999999999999999";
    const nines_320 = nines_80.repeat(4);
    //This will pass, 320 nines in a row is a valid integer format
    integerFormatString()("", nines_320);
    //Infinity
    parseFloat(nines_320);
    ```
*/

RegExp Constructor

Funnily enough, the RegExp constructor will benefit from regex-validated string types!

Right now, it is,

new(pattern: string, flags?: string): RegExp

However, we could have,

new(pattern: string, flags?: PatternOf</^[gimsuy]*$/>): RegExp

TL;DR (Please read it, though, I put in a lot of effort into this :cry: )

  • String-pattern types are more ergonomic than nominal/strucural tag types

    • Less boilerplate

  • String-pattern types are less likely than nominal/strucural tag types to become a Tower of Babel situation

    • Especially with regex subtype checks

  • String-pattern types are the most natural way of defining large finite/infinite subsets of the string type

    • Introducing this feature might even make people think about valid string formats for their libraries more closely!

  • String-pattern types enable stronger compile-time safety for some libraries (Let me get back to you on prevalence... runs away)

    • RegExp constructor, hex/alphanumeric strings, route path declarations, string identifiers for databases, etc.


Why are your regexes so bad?

A bunch of the use cases brought up by others wanted to introduce string-pattern types to fit existing libraries; and it doesn't seem to be convincing the TS team.

Often times, I feel like these existing libraries don't even use regexes that much to validate their input. Or, they use a regex to perform a simple validation. Then, they use a more complicated parser to perform the actual validation.

But this is an actual valid use case for string-pattern types!


String-pattern types to validate supersets of valid string values

Sure, a string that starts with /, does not end with /, does not contain consecutive /, and does not contain : will pass the "HTTP path regex". But this just means that the set of values that pass this regex is a superset of valid HTTP paths.

Further down, we have an actual URL path parser that checks that ? is not used, # is not used, some characters are escaped, etc.

But with this simple string-pattern type, we've already eliminated a large class of the common problems that a user of the library may encounter! And we eliminated it during compile-time, too!

It's not often that a user will use ? in their HTTP paths, because most are experienced enough to know that ? is the start of a query string.


I just realized you already know of this use case.

This thread implies a wide variety of use cases; concrete examples have been more rare. Troublingly, many of these examples don't seem to be complete - they use a RegExp that would reject valid inputs.

So, sure, a lot of the regexes proposed aren't "complete".
But as long as they don't reject valid input, it should be okay, right?

It's okay if they allow invalid input, right?
Since we could have a "real" parser during run-time handle the full validation.
And a compile-time check can eliminate a lot of common problems for downstream users, increasing productivity.

Those examples that reject valid input should be easy enough to modify, so that they don't reject valid input, but allow invalid input.


String-pattern types and intersections

Anyway, intersection types on string-pattern types would be super duper useful!

My .append() example could be written as,

append (str : (
  //Must start with forward slash
  & PatternOf</^\//>
  //Must not end with forward slash
  & PatternOf</[^/]$/>
  //Must not have consecutive forward slashes anywhere
  & not PatternOf</\/\//>
  //Must not contain colon
  & PatternOf</^[^:]+$/>
)) : SomeReturnType;

The not PatternOf</\/\//> could also be,
PatternOf</^((([/])(?!\3))|[^/])+$/> but this is so much more complicated

Thank you, @AnyhowStep, for the extensive demonstrations. I wanted to criticize you for making me read so much, but it turned out to be very helpful!

I often struggle with typing my internal apis full of string parameters, and I inevitably end up with a lot of conditionals that throw at run-time. Inevitably, my consumers need to duplicate these pattern checks, since they don’t want an exception, they want a special way to handle the failure.

// Today
function createServer(id: string, comment: string) {
  if (id.match(/^[a-z]+-[0-9]+$/)) throw new Error("Server id does not match the format");
  // work
}

// Nicer
function createServer(id: PatternOf</^[a-z]+-[0-9]+$/>, comment: string) {
  // work immediately
}

In the world of strings and patterns, a generic string is pretty much the same as unknown, removing a lot of type safety in favor of runtime checks, and causing inconvenience for my consuming developers.

For some of the use cases mentioned only a small subset of Regex would be required, e.g prefix matching.

Potentially this could be done with more general TS language features like Variadic Kinds #5453 and type inference when spreading string literal types.

Future speculation:

const x: ['a', 'b', 'c'] = [...'abc'] as const;

type T = [...'def']; // ['d', 'e', 'f'];
type Guard<T extends string> =
  [...T] extends [...'https://', ...any[]] ? Promise<any> : never;

declare function secureGET<
  T extends string
>(url: T): Guard<T>;

const x = secureGET('https://a.com');
x.then(...) // okay

const z = secureGET('http://z.com');
z.then(...); // error
type NaturalNumberString<T extends string> =
  [...T] extends ('0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9')[] ? T : never;

For some of the use cases mentioned only a small subset of Regex would be required, e.g prefix matching.

I still stand by my proposal, which offers this + a few other things, basically very small superset of star-free languages you can still reasonably efficiently check for subsetting and equality. And so far, I've not seen any other proposal attempt to address the performance aspect of arbitrary regular expressions, the biggest concern the TS team has.

The problem with star free languages is, as the name says, that you can’t use stars, which makes it hard to validate things like urls. Besides, most people are probably going to want stars, and just use an arbitrary number of repeating sequences to emulate them, which would make it hard to check for subsets.

And the performance of most normal DFA representable regexes isn’t that bad, and it’s possible to check these for sub/supersets.

You can still sort-of get *, though.

const str : PatternOf</ab+c/> | PatternOf</ac/>

@TijmenW Read my proposal a little more closely - there's some hidden rationales there, and a few small features that make it actually practical. It's not directly limited to specifying star-free grammars, but a small superset extended with just enough to make it practically useful for my semi-advanced use case. In particular, you can do starof ('a' | 'b' | ...) for individual characters and you can use string as equivalent to starof UnionOfAllCodePoints (in effect making it no longer a primitive in theory).

Also, checking if a regular language matches a subset of what another regular language matches is NP-complete and equivalent to the general subgraph isomorphism problem. This is why you can't just do standard regular languages, and why I tried to limit starof as much as possible, to try to keep the theoretical computational complexity down.

TODO: Please help us by identifying real library functions that could benefit from RegExp types, and the actual expression you'd use.

Take this with a grain of salt, since it's a brand new library, but any library like https://github.com/ostrowr/ts-json-validator would be made much more useful with something like a regex type.

The goal of the library is to generate Typescript type/JSON schema pairs <T, s> such that

  1. Any type that s can validate is assignable to T
  2. As few types as possible that are assignable to T fail validation when run against s.

A regex type would improve the strictness of (2) by allowing the validated type to be stricter about at least the following keywords:

  • format
  • patternProperties
  • propertyNames

TODO: Please help us by identifying real library functions that could benefit from RegExp types, and the actual expression you'd use.

All of Excel interface libraries could use type validation as A1 or A5:B7.

Property Keys / Regex String Indexers

Some libraries treat objects according to the property names. For example, in React we want to apply types to any prop whose name starts with aria-:

interface IntrinsicElements {
    // ....
    [attributeName: /aria-\w+/]: number | string | boolean;
}

This is effectively an orthogonal concept (we could add Regex types without adding Regex property keys, and vice versa).

I know this is a bit orthogonal to everything going on here but Wesley thought you could use our input. This keeps coming up in Fabric for multiple reasons. As a component library, we want to be able to elevate a component props interface that accurately reflects the React component interface allowed by TypeScript, including data- and aria- attributes. Without it we can't elevate accurate interfaces to our consumers to use for these attributes. This is becoming a bigger issue with the next version of Fabric where we are looking at pluggable implementations such as slots and need to define and allow these attributes on arbitrary interfaces.

If there is anything we can do to help, please let me know! 😄

TS Playground:

import * as React from 'react';

// Want to reflect the same aria- and data- attributes here that JSX compiler allows in this interface:
interface TestComponentProps {
    someProp?: number;
}

const TestComponent: React.FunctionComponent<TestComponentProps> = () => {
    return null;
}

const ConsumerComponent: React.FunctionComponent = () => {
    // The React component interface allows for 'data-' and 'aria-' attributes, but we don't have any typesafe way of
    // elevating that interface or instantiating props objects that allow the same attributes. We just want to be able to 
    // define component interfaces that match what the React component interface allows without opening it up to 'any' and 
    // giving up all type safety on that interface.
    const testComponentProps: TestComponentProps = {
        someProp: 42,
        'data-attribute-allowed': 'test'
    };

    return (
        <TestComponent
            someProp={42}
            // 'data-' and 'aria-' attributes are only allowed here:
            data-attribute-allowed={'data-value'}
            aria-attribute-allowed={'aria-value'}
            {...testComponentProps}
        />
    )
}

TODO: Please help us by identifying real library functions that could benefit from RegExp types, and the actual expression you'd use

Cron jobs. (very surprised this wasn't mentioned)

^((\*|\d+((\/|\-|,){0,1}(\d+))*)\s*){6}$

Just throwing in my two cents here - I'm working on a React project where we'd like to validate a prop that will be used as an HTML id attribute. This means it must meet the following rules or unexpected behavior will occur:

  1. Have at least one character
  2. Not have spaces

In other words:

interface Props {
  id: PatternOf</[^ ]+/>;
}

Another example: sanctuary-type-identifiers with strings expected in format '<namespace>/<name>[@<version>]'

Use case: stringly-typed DOM APIs like Navigator.registerProtocolHandler().

Quoting MDN:

For security reasons, registerProtocolHandler() restricts which schemes can be registered.

A custom scheme may be registered as long as:

  • The custom scheme's name begins with web+
  • The custom scheme's name includes at least 1 letter after the web+ prefix
  • The custom scheme has only lowercase ASCII letters in its name.

In other words, Navigator.registerProtocolHandler() expects either a well-known string or a custom string but only if it conforms to a specific schema.

CSS Custom Properties for CSSType is another use case to provide closed types for all properties except for those prefixed with --.

interface Properties {
    // ....
    [customProperty: /--[a-z][^\s]*/]: number | string;
}`

Related https://github.com/frenic/csstype/issues/63

Can someone tell me if this is the same as refinement types? https://github.com/microsoft/TypeScript/issues/7599

@gautam1168 It's theoretically just a subset, where it's refining string types specifically. (Numeric types have their own concerns, of course.)

For some of the use cases mentioned only a small subset of Regex would be required, e.g prefix matching.

I still stand by my proposal, which offers this + a few other things, basically very small superset of star-free languages you can still reasonably efficiently check for subsetting and equality. And so far, I've not seen any other proposal attempt to address the performance aspect of arbitrary regular expressions, the biggest concern the TS team has.

In this comment,
https://github.com/microsoft/TypeScript/issues/6579#issuecomment-243338433

Someone linked to,
https://bora.uib.no/handle/1956/3956

Titled, "The Inclusion Problem for Regular Expressions"


However,

  • If the right-hand expression is 1-unambiguous, the algorithm gives the correct answer.
  • Otherwise, it may give the correct answer, or no answer.

https://www.sciencedirect.com/science/article/pii/S0022000011001486

(Of course, JS regular expressions are non-regular)

@AnyhowStep That might work - I'd just lift the starof restriction and change that restriction accordingly. I would like a better way of characterizing the restriction, since the math is a bit abstract and it's unclear how it would concretely apply in practice (not everyone who would use those types are well-versed in formal languages).

Also, separately, I'd very strongly like a better alternative to starof as an operator to model that kind of thing.

I'm curious: Is it possible to decide inclusion/containment of regular expressions? According do wikipedia, it is decidable. However, does this also account for regular expressions in JS? I think they have more features than standard REs (e.g. backreferences). If it's decidable, is it computationally feasible?
This would affect this feature (narrowing):

if (Gmail.test(candidate)) {
    // candidate is also an Email
}

@nikeee Decidable isn't enough for this to be realistic. Even quadratic time is generally too slow at this scale. Not TS, but I do have some background on similar issues.

In the face of backreferences, I suspect it still is decidable, but likely exponential if not worse. Just an educated guess, though.

Thanks for clarifying this!

Even quadratic time is generally too slow at this scale.

This is why I also asked if it is computationally feasible, so I guess it isn't.

If the same applies to equality, doesn't that mean that almost every property of this feature is infeasible? Correct me, if I'm wrong, but it seems the only thing that is left is membership. I don't think that this alone would be useful.

@nikeee It's worth keeping in mind that this pattern will be checked against literally every property of every type it's compared against. And for types with regexp properties, you do have to compute whether a regexp matches a subset of what another regexp matches, a rather complicated beast in of itself.

It's not impossible, just difficult, and you have to be restrictive if you want it to be feasible. (For one, JS regexps wouldn't work - they are not only not extensible enough, but also too flexible.)

Edit: I do want to reiterate this: I'm not on the TS team, just to clarify. I just have a decent background in CS algorithm design.

Hmm, so maybe you can only support a "limited" subset of "usual" RegEx. Given the use cases, the regex'es were quite simple so far… (colors, phone numbers etc.)

How could we design the UX of supporting only a subset? It might not be very clear to the user that Feature X of RegEx works, but Y doesn't.

well …don't call it "regex" – for a starter. Maybe just "pattern matching" or so :see_no_evil:. But yeah, likely not an easy task…

What about a non-regex syntax like this:

type TLD = 'com' | 'net' | 'org';
type Domain = `${string}.${TLD}`;
type URL = `${'http'|'https'}://${Domain}`;

const good: URL = 'https://google.com'; // ✔️
const bad: URL = 'ftp://example.com'; // ✖️ TypeError: 'ftp' is not assignable to type 'http' | 'https'

In my mind, this would fit quite nicely in the type system. You could add different syntax for things like optional matches:

type SubDomain = `${string}.`;
type Domain = `${SubDomain}?${string}.${TLD}`;

Add support for quantifiers, greedy operator and you've got something quite robust I would think that would probably be enough for a majority of the use cases developers might want to use this for.

I think this approach would be more user-friendly. However, it seems to be aquvalent to arithmetic operations on types.
According to https://github.com/microsoft/TypeScript/issues/15645#issuecomment-299917814 and https://github.com/microsoft/TypeScript/issues/15794#issuecomment-301170109, it is a design decision to not do arithmetics on types.
If I'm not mistaking, this approach can easily create an enormous type. Consider:

type TLD = 'com' | 'net' | 'org' | 'ly' | 'a' | 'b' | 'c' | 'd';
type Foo = `${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}`;
type Bar = `${Foo}${Foo}${Foo}${Foo}${Foo}`

(this assumes the implementation would use union types. It may work with a different / more complex implementation)

Dislaimer: I am not part of the TS team and I am not working on TS. Just my 2c.

@rozzzly @nikeee That's more or less the essence of my proposal, just with a few smaller features missing. I based mine on a large subset of regular languages (the formal language concept), not regular expressions in the sense of regexp literals and such, so it's much less powerful than those but powerful enough to get the job done.

I think this approach would be more user-friendly. However, it seems to be aquvalent to arithmetic operations on types.

Math says that validating whether a type is a subtype of another is computationally equivalent to checking if a string is contained within a given formal language.

Domain validation specifically is actually a pretty complicated thing to do if you also check for TLD/public suffix validity. Generic domains themselves per the RFC are as simple as /[0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)+/ + being at most 255 characters, but even this is very complicated to type unless you go for full regular grammars as the above regexp demonstrates. You could programmatically generate the type pretty straightforwardly (I'll leave it as an exercise for the reader) using only strings from @rozzzly's or my proposal, but the end result is still rather complicated.

@isiahmeadows

That's more or less the essence of my proposal, just with a few smaller features missing.

The last time I read through this entire thread was well over a year ago. I was on my break and saw a notification, read @rugk's comment about _"well …don't call it "regex" – for a starter"_ which got me thinking... I hadn't realized someone had already pitched a considerably more detailed proposal for essentially the same _(/a very similar)_ idea.

...even this is very complicated to type unless you go for full regular grammars as the above regexp demonstrates. You could programmatically generate the type pretty straightforwardly (I'll leave it as an exercise for the reader) using only strings from @rozzzly's or my proposal, but the end result is still rather complicated.

In my mind, some facility for limited pattern matching such I suggested allow would be extremely useful for very simplistic, and _necessarily not-rigorously strict_ typing. The example I gave is far from precise, and wouldn't blow up the compiler.

But as @nikeee and you both point out this could be taken to dangerous extremes. Assuming a most naive implementation supporting only unions. Somebody is going to ruin everyone's day publishing an update to @types/some-popular-project that contained:

type MixedCaseAlphaNumeric = (
    | 'a'
    | 'b'
    | 'c'
    // and so on
);

type StrWithLengthBeteen1And64<Charset extends string> = (
    | `${Charset}`
    | `${Charset}|${Charset}`
    | `${Charset}|${Charset}|${Charset}`
    // and so on
);

function updatePassword(userID: number, password: StrWithLengthBetween1And64<MixedCaseAlphaNumeric>): void {
    // ...
}

Putting that into perspective, that union consists of distinct types which is more than atoms in the observable universe.

Now, I've seen some dreadfully long assignability errors but imagine (the untruncated) error for that....

Type '"😢"' is not assignable to type '"a"|"b"|"c"..........."'.ts(2322)'

So yeah.. there are some issues there

@rozzzly What makes that type any different (in terms of feasibility) than a TupleWithLengthBeteen1And64<Charset>?
The complier isn't forced to expand every type to a normalized form, it would quickly explode on fairly normal types if it did.
Not saying I think this issue makes sense in typescript at the moment, if even "integer between 3 and 1024" (think message buffer allocation lengths) is considered out of scope.

@simonbuchan At least prefix and suffix types need to exist, if nothing else. That is itself required for many DOM libraries and frameworks.

I know this has been beaten to death and some good proposals have been given already. But I just wanted to add extra stuff that some might find mildly interesting.

In the face of backreferences, I suspect it still is decidable, but likely exponential if not worse. Just an educated guess, though.

Back references can make a regexp describe a context-sensitive grammar, a superset of context-free grammars. And language equality for CFGs is undecidable. So it's even worse for CSGs, which are equivalent to linear-bounded automatons.


Assuming just all the regular expressions that can be converted to a DFA are used in a regexp (concat, union, star, intersection, complement, etc.), converting a regexp to an NFA is O(n), getting the product of two NFAs is O(m*n), then traversing the resulting graph for accept states is O(m*n). So, checking the language equality/subset of two regular regexps is also O(m*n).

The problem is that the alphabet is really large here. Textbooks restrict themselves to alphabets of size 1-5 usually, when talking about DFAs/NFAs/regular expressions. But with JS regexps, we have all of unicode as our alphabet. Granted, there can be efficient ways of representing transition functions using sparse arrays and other clever hacks and optimizations for equality/subset testing...

I'm confident it's possible to do type checking for regular-to-regular assignment somewhat efficiently.

Then, all non-regular assignments can just require explicit type assertions.

I've recently worked on a small finite automaton project, so the info is still fresh in my mind =x

If I'm not mistaking, this approach can easily create an enormous type. Consider:

type TLD = 'com' | 'net' | 'org' | 'ly' | 'a' | 'b' | 'c' | 'd';
type Foo = `${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}${TLD}`;
type Bar = `${Foo}${Foo}${Foo}${Foo}${Foo}`

(this assumes the implementation would use union types. It may work with a different / more complex implementation)

Funnily enough, this is exactly what's possible with the new template string literal types. This case is avoided by having a threshold for union types, it seems.

@AnyhowStep JS backreferences are the only context-sensitive production (and a fairly simple and limited one at that - only up to 9 groups can be referenced like that), and the rest of the regexp grammar is regular, so that's why I suspect it is decidable. But regardless, I think we can agree it's not practical in any sense of the word. 🙂

Edit: accuracy

I confirmed this comment from @rozzzly works with TS 4.1.0 nightly!

type TLD = 'com' | 'net' | 'org';
type Domain = `${string}.${TLD}`;
type Url = `${'http'|'https'}://${Domain}`;

const success: Url = 'https://example.com';
const fail: Url = 'example.com';
const domain: Domain = 'example.com';

Try it in the playground and see that fail has a compile time error 🤩


Update: after playing with this feature a bit, it will not cover many use cases. For example, it doesn't work for a hex color string.

type HexChar = '0' | '1' | '2' | '3' | '4' | '5' | '6'| '7' | '8' | '9' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F';
type HexColor = `#${HexChar}${HexChar}${HexChar}${HexChar}${HexChar}${HexChar}`;
let color: HexColor = '#123456';

Today, that fails with "Expression produces a union type that is too complex to represent.(2590)"

I confirmed this comment from @rozzzly works with TS 4.1.0 nightly!

type TLD = 'com' | 'net' | 'org';
type Domain = `${string}.${TLD}`;
type Url = `${'http'|'https'}://${Domain}`;

const success: Url = 'https://example.com';
const fail: Url = 'example.com';
const domain: Domain = 'example.com';

Try it in the playground and see that fail has a compile time error 🤩

This would solve the data- or aria- problem that most of us face in UX libraries if it can be applied to indexes.

_Update_: after playing with this feature a bit, it will not cover many use cases. For example, it doesn't work for a hex color string.

type HexChar = '0' | '1' | '2' | '3' | '4' | '5' | '6'| '7' | '8' | '9' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F';
type HexColor = `#${HexChar}${HexChar}${HexChar}${HexChar}${HexChar}${HexChar}`;
let color: HexColor = '#123456';

Today, that fails with "Expression produces a union type that is too complex to represent.(2590)"

There was some reference to this limitation in the release notes. It creates a list of all the possible valid combinations, in this case it would create a union with 16,777,216 (i.e., 16^6) members.

This is a great idea... Igmat made some incredible posts back in 2016 that look good on paper anyway.

I found this because I wanted to make sure the keys of an object literal passed into my function were valid css class names. I can easily check at runtime... but to me it seems so obvious that typescript should be able to do this at compile time, especially in situations where I am just hard-coding object literals and typescript shouldn't have to figure out if MyUnionExtendedExotictype satisfies SomeArbitraryRegexType.

Maybe one day I will be knowledgeable enough to make a more productive contribution :/

I confirmed this comment from @rozzzly works with TS 4.1.0 nightly!

Wow. I honestly did not expect to see this get implemented, not anytime soon at least.

@chadlavi-casebook

There was some reference to this limitation in the release notes. It creates a list of all the possible valid combinations, in this case it would create a union with 16,777,216 (i.e., 16^6) members.

I'd be curious to see how large that union could get before it became a problem performance wise. @styfle's example shows how easy it is to hit that ceiling. There's obviously going to be a some degree of diminishing returns of usefulness of complex types vs performance.

@thehappycheese

I wanted to make sure the keys of an object literal passed into my function were valid css class names

I'm fairly confident in saying that it's not possible with the current implementation. If there was support for quantifiers and ranges you would probably get validation for BEM style class names. The standard js regex for that isn't _too_ terrible:
^\.[a-z]([a-z0-9-]+)?(__([a-z0-9]+-?)+)?(--([a-z0-9]+-?)+){0,2}$
You would also ditch the anchors because as the implementation stands, it's either an end-to-end match or nothing so ^ and $ are implied. Now that's a comparatively simple regex for a narrow subset of what is a valid css selector. For example: ಠ_ಠ is a valid class name. I'm not kidding. CSS selectors are very permissive swhich makes them extremely difficult to validate. So your desire is probably out of scope for template literal types, at least for the foreseeable future. 😞

I'm sorry. I had to do this.

I implemented regular languages in TypeScript.

More accurately, I implemented a simple deterministic finite automaton using TS 4.1

I mean, we can already implement Turing machines in TS. So, DFAs and PDAs are "easy", compared to that.

And template strings make this more usable.


The core types are actually simple and fit in < 30 LOC,

type Head<StrT extends string> = StrT extends `${infer HeadT}${string}` ? HeadT : never;

type Tail<StrT extends string> = StrT extends `${string}${infer TailT}` ? TailT : never;

interface Dfa {
    startState : string,
    acceptStates : string,
    transitions : Record<string, Record<string, string>>,
}

type AcceptsImpl<
    DfaT extends Dfa,
    StateT extends string,
    InputT extends string
> =
    InputT extends "" ?
    (StateT extends DfaT["acceptStates"] ? true : false) :
    AcceptsImpl<
        DfaT,
        DfaT["transitions"][StateT][Head<InputT>],
        Tail<InputT>
    >;

type Accepts<DfaT extends Dfa, InputT extends string> = AcceptsImpl<DfaT, DfaT["startState"], InputT>;

It's specifying the automatons that's the hard part.

But I'm pretty sure someone can make a regex to TypeScript DFA™ generator...


I'd also like to highlight that the "hex string of length 6" example shows you can make function parameters only accept strings matching the regex using ugly hackery,

declare function takesOnlyHex<StrT extends string> (
    hexString : Accepts<HexStringLen6, StrT> extends true ? StrT : {__err : `${StrT} is not a hex-string of length 6`}
) : void;

//OK
takesOnlyHex("DEADBE")

//Error: Argument of type 'string' is not assignable to parameter of type '{ __err: "DEADBEEF is not a hex-string of length 6"; }'.
takesOnlyHex("DEADBEEF")

//OK
takesOnlyHex("01A34B")

//Error: Argument of type 'string' is not assignable to parameter of type '{ __err: "01AZ4B is not a hex-string of length 6"; }'.
takesOnlyHex("01AZ4B")

Here's a bonus Playground; it implements the regex /^hello .*/

And another Playground; it implements the regex / world$/

One final example, Playground; this is a floating point string regex!

@AnyhowStep Well i used your DFA idea to implement a simple regex [abc]{4} which means the letters abc in any order with missing but exactly the length of 4. (aaaa, abcc, bbcc, etc...).
Playground

https://cyberzhg.github.io/toolbox/min_dfa?regex=ZCgoYmQqYiopKmMpKg==

https://github.com/CyberZHG/toolbox

If I had more willpower, I'd grab something like the above and use it to turn regexes into TS DFAs™ lol

Okay, I just threw together a prototype,

https://glitch.com/~sassy-valiant-heath

[Edit] https://glitch.com/~efficacious-valley-repair <-- This produces way better output for more complicated regexes

[Edit] It seems like Glitch will archive free projects that are inactive for too long. So, here's a git repo with the files,
https://github.com/AnyhowStep/efficacious-valley-repair/tree/main/app

Step 1, key in your regex here,
image

Step 2, click convert,
image

Step 3, click the generated TS playground URL,
image

Step 4, scroll down till InLanguage_0,
image

Step 5, play with input values,
image

image

Shoutout to @kpdyer , author of https://www.npmjs.com/package/regex2dfa , for doing the heavy lifting of the conversion

In case someone needs something a little more powerful, here's a Turing machine 😆

Playground

This thread has gotten too long to read and many of the comments are either addressed by template literal types or are off-topic. I've created a new issue #41160 for discussion of what remaining use cases might be enabled by this feature. Feel free to continue discussing type system parsers here 😀

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dlaberge picture dlaberge  ·  3Comments

jbondc picture jbondc  ·  3Comments

Zlatkovsky picture Zlatkovsky  ·  3Comments

uber5001 picture uber5001  ·  3Comments

Antony-Jones picture Antony-Jones  ·  3Comments