Efcore: RegexMatchTimeoutException for SqlServerMigrationsSqlGenerator

Created on 20 Jun 2019  路  9Comments  路  Source: dotnet/efcore

System.Text.RegularExpressions.RegexMatchTimeoutException: The RegEx engine has timed out while trying to match a pattern to an input string. This can occur for many reasons, including very large inputs or excessive backtracking caused by nested quantifiers, back-references and other factors.
   at System.Text.RegularExpressions.RegexRunner.DoCheckTimeout()
   at System.Text.RegularExpressions.RegexInterpreter.Go()
   at System.Text.RegularExpressions.RegexRunner.Scan(Regex regex, String text, Int32 textbeg, Int32 textend, Int32 textstart, Int32 prevlen, Boolean quick, TimeSpan timeout)
   at System.Text.RegularExpressions.Regex.Run(Boolean quick, Int32 prevlen, String input, Int32 beginning, Int32 length, Int32 startat)
   at System.Text.RegularExpressions.Regex.Match(String input, Int32 startat)
   at System.Text.RegularExpressions.RegexReplacement.Replace(Regex regex, String input, Int32 count, Int32 startat)
   at System.Text.RegularExpressions.Regex.Replace(String input, String replacement, Int32 count, Int32 startat)
   at System.Text.RegularExpressions.Regex.Replace(String input, String replacement)
   at System.Text.RegularExpressions.Regex.Replace(String input, String pattern, String replacement, RegexOptions options, TimeSpan matchTimeout)
   at Microsoft.EntityFrameworkCore.Migrations.SqlServerMigrationsSqlGenerator.Generate(SqlOperation operation, IModel model, MigrationCommandListBuilder builder)
   at Microsoft.EntityFrameworkCore.Migrations.MigrationsSqlGenerator.<>c.<.cctor>b__73_25(MigrationsSqlGenerator g, MigrationOperation o, IModel m, MigrationCommandListBuilder b)
   at Microsoft.EntityFrameworkCore.Migrations.MigrationsSqlGenerator.Generate(MigrationOperation operation, IModel model, MigrationCommandListBuilder builder)
   at Microsoft.EntityFrameworkCore.Migrations.SqlServerMigrationsSqlGenerator.Generate(MigrationOperation operation, IModel model, MigrationCommandListBuilder builder)
   at Microsoft.EntityFrameworkCore.Migrations.MigrationsSqlGenerator.Generate(IReadOnlyList`1 operations, IModel model)
   at Microsoft.EntityFrameworkCore.Migrations.SqlServerMigrationsSqlGenerator.Generate(IReadOnlyList`1 operations, IModel model)
   at Microsoft.EntityFrameworkCore.Migrations.Internal.Migrator.GenerateUpSql(Migration migration)
   at Microsoft.EntityFrameworkCore.Migrations.Internal.Migrator.<>c__DisplayClass13_2.<GetMigrationCommandLists>b__2()
   at Microsoft.EntityFrameworkCore.Migrations.Internal.Migrator.Migrate(String targetMigration)
   at Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.Migrate(DatabaseFacade databaseFacade)

Version: 2.2.4
Provider: SqlServer

I have some fairly large initial Migrations to set up some static data (approx 100,000 lines):

```C#
public partial class InitialStateStaticData : Migration
{
protected override void Up(MigrationBuilder migrationBuilder)
{
var sql = @"

";
migrationBuilder.Sql(sql, true);
}

    // ...
}

```
The error seems to be coming from the fact that there is a limited 1000ms timeout on the regex parsing of this.

https://github.com/aspnet/EntityFrameworkCore/blob/f9d1df40fed1e1d7780553c3f9c1813e81eca1aa/src/EFCore.SqlServer/Migrations/SqlServerMigrationsSqlGenerator.cs#L1105

To mitigate this I added more GO statements into my SQL to avoid it, which worked fine. That was until I deployed it into an Azure Function and now presumably because of the lower CPU/Mem I am running into issues.

Obviously I am looking in to way I could reduce the 100,000 lines and/or increasing resources, but it might also be helpful if this Timeout was overidable somehow (if the value wasn't set at all I could have used the REGEX_DEFAULT_MATCH_TIMEOUT Environment variable), perhaps in the migrationBuilder.Sql( method call?

area-migrations poachable punted-for-3.0 punted-for-5.0 type-bug

Most helpful comment

Here's our splitting code copied into an extension method. You can tweak it as needed:

static void LargeSql(this MigrationBuilder migrationBuilder, string sql, bool suppressTransaction = false)
{
    var batches = Regex.Split(
        Regex.Replace(
            sql,
            @"\\\r?\n",
            string.Empty,
            default,
            TimeSpan.FromMilliseconds(1000.0)), // TODO: Increase timeout as needed
        @"^\s*(GO[ \t]+[0-9]+|GO)(?:\s+|$)",
        RegexOptions.IgnoreCase | RegexOptions.Multiline,
        TimeSpan.FromMilliseconds(1000.0)); // TODO: Increase timeout as needed
    for (var i = 0; i < batches.Length; i++)
    {
        if (batches[i].StartsWith("GO", StringComparison.OrdinalIgnoreCase)
            || string.IsNullOrWhiteSpace(batches[i]))
        {
            continue;
        }

        var count = 1;
        if (i != batches.Length - 1
            && batches[i + 1].StartsWith("GO", StringComparison.OrdinalIgnoreCase))
        {
            var match = Regex.Match(
                batches[i + 1], "([0-9]+)",
                default,
                TimeSpan.FromMilliseconds(1000.0)); // TODO: Increase timeout as needed
            if (match.Success)
            {
                count = int.Parse(match.Value);
            }
        }

        for (var j = 0; j < count; j++)
        {
            var batchSql = batches[i];
            if (i == batches.Length - 1)
            {
                batchSql += Environment.NewLine;
            }

            migrationBuilder.Sql(batchSql, suppressTransaction);
        }
    }
}

All 9 comments

@bricelam can you please explain the workaround?

Triage: We believe the root cause of the issue is what @oatsoda mentioned above: For a sufficiently long input and a sufficiently slow CPU, the regular expression processing times out before it can complete the work.

We have aggressively added timeout to any regex usage across our codebase as a blanket measure to mitigate catastrophic backtracking. When doing so actually causes functional issues we need to consider other alternatives. For example:

  1. Rewrite the functionality without using a regex
  2. Improve the regex to be more efficient (e.g. using lookahead or lookbehind subexpressions) so that they are more likely to process the whole input in the allotted time.
  3. Employ a non backtracking (?> subexpression ) to prevent any backtracking if the regex doesn't require it

Adding a timeout parameter to the Sql() method doesn't seem to make sense (users shouldn't need to know that there is a regex doing work).

All that said, we don't have much time left in 3.0 and we wish to punt this issue to the backlog.

In the meantime we believe that customers that hit this can implement themselves the split in application code and make multiple calls to the Sql() methods rather than pass long SQL strings with multiple GO to a single call.

@divega I will give it a go splitting it. Thanks.

Here's our splitting code copied into an extension method. You can tweak it as needed:

static void LargeSql(this MigrationBuilder migrationBuilder, string sql, bool suppressTransaction = false)
{
    var batches = Regex.Split(
        Regex.Replace(
            sql,
            @"\\\r?\n",
            string.Empty,
            default,
            TimeSpan.FromMilliseconds(1000.0)), // TODO: Increase timeout as needed
        @"^\s*(GO[ \t]+[0-9]+|GO)(?:\s+|$)",
        RegexOptions.IgnoreCase | RegexOptions.Multiline,
        TimeSpan.FromMilliseconds(1000.0)); // TODO: Increase timeout as needed
    for (var i = 0; i < batches.Length; i++)
    {
        if (batches[i].StartsWith("GO", StringComparison.OrdinalIgnoreCase)
            || string.IsNullOrWhiteSpace(batches[i]))
        {
            continue;
        }

        var count = 1;
        if (i != batches.Length - 1
            && batches[i + 1].StartsWith("GO", StringComparison.OrdinalIgnoreCase))
        {
            var match = Regex.Match(
                batches[i + 1], "([0-9]+)",
                default,
                TimeSpan.FromMilliseconds(1000.0)); // TODO: Increase timeout as needed
            if (match.Success)
            {
                count = int.Parse(match.Value);
            }
        }

        for (var j = 0; j < count; j++)
        {
            var batchSql = batches[i];
            if (i == batches.Length - 1)
            {
                batchSql += Environment.NewLine;
            }

            migrationBuilder.Sql(batchSql, suppressTransaction);
        }
    }
}

Thanks @bricelam I will give it a go.

@divega I can understand that another parameter might be a bit ugly as consumers won't really want to know that a Regex is taking place, but equally they may be like me and hit this issue! Perhaps you could detect if the environment variable REGEX_DEFAULT_MATCH_TIMEOUT is set and if so don't force the 1000ms timeout - allow Regex to just use the default that the environment has set?

Thanks @oatsoda. IMO, that is reasonable. When we actually try to fix this issue we can try this option if we can't come up with something better that makes the timeout unnecessary.

cc @bricelam

@bricelam I've used your workaround to increase the timeout but I think there could be somewhere else that a Regex is applied.

It gets past the migrationBuilder.LargeSql (to the end of the Up method in the Migration) that I have added...but then AFTER that it gets a Regex timeout...

System.Text.RegularExpressions.RegexMatchTimeoutException:
   at System.Text.RegularExpressions.RegexRunner.DoCheckTimeout (System.Text.RegularExpressions, Version=4.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Text.RegularExpressions.RegexInterpreter.Go (System.Text.RegularExpressions, Version=4.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Text.RegularExpressions.RegexRunner.Scan (System.Text.RegularExpressions, Version=4.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Text.RegularExpressions.Regex.Run (System.Text.RegularExpressions, Version=4.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Text.RegularExpressions.Regex.Match (System.Text.RegularExpressions, Version=4.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Text.RegularExpressions.Regex.Split (System.Text.RegularExpressions, Version=4.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Text.RegularExpressions.Regex.Split (System.Text.RegularExpressions, Version=4.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Text.RegularExpressions.Regex.Split (System.Text.RegularExpressions, Version=4.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at System.Text.RegularExpressions.Regex.Split (System.Text.RegularExpressions, Version=4.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
   at Microsoft.EntityFrameworkCore.Migrations.SqlServerMigrationsSqlGenerator.Generate (Microsoft.EntityFrameworkCore.SqlServer, Version=2.2.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.EntityFrameworkCore.Migrations.MigrationsSqlGenerator+<>c.<.cctor>b__73_25 (Microsoft.EntityFrameworkCore.Relational, Version=2.2.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.EntityFrameworkCore.Migrations.MigrationsSqlGenerator.Generate (Microsoft.EntityFrameworkCore.Relational, Version=2.2.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.EntityFrameworkCore.Migrations.SqlServerMigrationsSqlGenerator.Generate (Microsoft.EntityFrameworkCore.SqlServer, Version=2.2.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.EntityFrameworkCore.Migrations.MigrationsSqlGenerator.Generate (Microsoft.EntityFrameworkCore.Relational, Version=2.2.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.EntityFrameworkCore.Migrations.SqlServerMigrationsSqlGenerator.Generate (Microsoft.EntityFrameworkCore.SqlServer, Version=2.2.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.EntityFrameworkCore.Migrations.Internal.Migrator.GenerateUpSql (Microsoft.EntityFrameworkCore.Relational, Version=2.2.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.EntityFrameworkCore.Migrations.Internal.Migrator+<>c__DisplayClass13_2.<GetMigrationCommandLists>b__2 (Microsoft.EntityFrameworkCore.Relational, Version=2.2.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.EntityFrameworkCore.Migrations.Internal.Migrator.Migrate (Microsoft.EntityFrameworkCore.Relational, Version=2.2.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)
   at Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.Migrate (Microsoft.EntityFrameworkCore.Relational, Version=2.2.4.0, Culture=neutral, PublicKeyToken=adb9793829ddae60)

I was afraid of that. If there aren't actually any GO statements, pre-splitting won't help. Need to override this code instead.

class MySqlServerMigrationsSqlGenerator : SqlServerMigrationsSqlGenerator
{
    protected override void Generate(
        SqlOperation operation,
        IModel model,
        MigrationCommandListBuilder builder)
    {
        // TODO: Reimplement without timeouts
    }
}
optionsBuilder
    .UseSqlServer(...)
    .ReplaceService<IMigrationsSqlGenerator, MySqlServerMigrationsSqlGenerator>();

@bricelam Yeah, I do have GO statements, but I had to reduce it down to each batch being no more than 25 lines (I have about 100,000 in total) to avoid the timeout. Significantly reduced resources running on an Azure Function unfortunately.

Thanks though, the override of the SqlServerMigrationsSqlGenerator worked.

Was this page helpful?
0 / 5 - 0 ratings