Pomelo.entityframeworkcore.mysql: Group by time interval

Created on 5 Aug 2020  路  12Comments  路  Source: PomeloFoundation/Pomelo.EntityFrameworkCore.MySql

Steps to reproduce

Here is my query.

async Task<IReadOnlyList<EventHistogramModel>> GetTimeline(
            DateTime fromTime, TimeSpan timeUnit, int nofUnits)
        {
            var earliest = fromTime.Subtract(timeUnit.Multiply(nofUnits));

            var evList = await mDbContext.Logs
                .Where(e => e.EventTime <= fromTime && e.EventTime >= earliest)
                .GroupBy(e => (int)Math.Floor(e.EventTime.Subtract(earliest).Divide(timeUnit)))
                .Select(g => new
                {
                    Bucket = g.Key,
                    Events = g.Count()
                })
                .ToDictionaryAsync(e => e.Bucket);

The issue

Works in a test case (in-memory DB), yields an error on MySQL:

The LINQ expression ... could not be translated...

I suspect this is my overly complicated GroupBy selector.

This article describes how to do what I want in pure SQL: https://stackoverflow.com/questions/4342370/grouping-into-interval-of-5-minutes-within-a-time-range

How to do this with LINQ?

Further technical details

MySQL version: MariaDB: mysql Ver 15.1 Distrib 10.4.12-MariaDB
Operating system: Windows
Pomelo.EntityFrameworkCore.MySql version: 3.2.0-preview.20372.3
Microsoft.AspNetCore.App version: 3.1.3

closed-question type-question

All 12 comments

Changed the GroupBy as following:

    var earliestTicks = earliest.Ticks;
    var unitTicks = timeUnit.Ticks;

    var evList = await query
        .GroupBy(e => ((e.EventTime.Ticks - earliestTicks) / unitTicks))

Same error.

Just in case, switched ToDictionaryAsync() to ToListAsync(), same error. So it's not the dictionary creation. Must be the GroupBy then.

Another try, this time I found a translatable function:

    var unitSeconds = timeUnit.TotalSeconds;

    var evList = await query
        .GroupBy(e => EF.Functions.DateDiffSecond(earliest, e.EventTime) / unitSeconds)

Unfortunately, it still yields an error, though the message is different, so I am probably on the right course.

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near "double) / 3600 AS signed) AS `Bucket`, COUNT(*) AS `Events`
FROM `Logs`" at line 1

Finally fixed by casting TotalSeconds to int.

A final comment - grouping by this expression produces multiple entries with the same key. Probably, because the expression itself is a floating point number, when converted to SQL. Then it gets assigned to an int Bucket, thus giving multiple instances of the same number.

@mc0re Has this been resolved for you, or is there still anything for us to do/check?

No, it's not fixed, I created a workaround like this:

var unitSeconds = (int)timeUnit.TotalSeconds;
var evList = await query
    .GroupBy(e => EF.Functions.DateDiffSecond(earliest, e.EventTime) / unitSeconds)
    .Select(g => new
    {
        Bucket = g.Key,
        Events = g.Count()
    })
    .ToListAsync();

var evDict = evList
    .GroupBy(e => e.Bucket)
    .Select(g => new
    {
        Bucket = g.Key,
        Events = g.Sum(e => e.Events)
    })
    .ToDictionary(e => e.Bucket, e => e.Events);

I guess the reason for the error is that during the first GroupBy the calculation is performed as a floating point, so 1.1 and 1.2 produce different buckets. Then the numbers are rounded, I don't know why, and thus there will be more than one record with the same key. The second GroupBy takes care of grouping them again.

I have three database environments: InMemory for unit tests, MySQL on Azure for local tests (they don't seem to have MariaDB), and MariaDB for the actual execution. I'm telling this because the error was caught during the local test, InMemory worked fine. I didn't try it on MariaDB - you don't want to push to production code you assume does not work :-)

.GroupBy(e => (int)Math.Floor(e.EventTime.Subtract(earliest).Divide(timeUnit)))

In the OP query, the GroupBy line cannot work because while Subtract() is a valid .NET DateTime method, it is currently not being translated by Pomelo, and Divide() is likely to be some custom extension method you defined and can therefore not be translated as well.

As for the query you ended up with, the following one will work directly (your original query is pretty close to this one):

```c#
var evDict = await context.Logs
.Where(e => e.EventTime <= fromTime && e.EventTime >= earliest)
.GroupBy(e => Math.Floor((decimal)EF.Functions.DateDiffSecond(earliest, e.EventTime) / unitSeconds))
.Select(g => new
{
Bucket = g.Key,
Events = g.Count()
})
.ToDictionaryAsync(e => e.Bucket, e => e.Events);


The dictionary is a `Dictionary<decimal, int>` here. If you want a `Dictionary<int, int>` instead, just either cast the result of `Math.Floor()` to int, or `g.Key` when assigning it to `Bucket.`

Here is a sample program to demonstrate this:
<details>
<summary>Sample console project</summary>

```c#
using System;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging;

namespace IssueConsoleTemplate
{
    public class Log
    {
        public int LogId { get; set; }
        public DateTime EventTime { get; set; }
        public string Description { get; set; }
    }

    public class Context : DbContext
    {
        public DbSet<Log> Logs { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder
                .UseMySql(
                    "server=127.0.0.1;port=3306;user=root;password=;database=Issue1137",
                    b => b.ServerVersion("8.0.20-mysql"))
                .UseLoggerFactory(
                    LoggerFactory.Create(
                        b => b
                            .AddConsole()
                            .AddFilter(level => level >= LogLevel.Information)))
                .EnableSensitiveDataLogging()
                .EnableDetailedErrors();
        }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            var logs = Enumerable.Range(0, 60)
                .Select(
                    n => new Log
                    {
                        LogId = n + 1,
                        EventTime = new DateTime(2020, 8, 15, 10, n, 0),
                        Description = $"Something occurred {n} times",
                    });

            modelBuilder.Entity<Log>()
                .HasData(logs);
        }
    }

    internal static class Program
    {
        private static async Task Main()
        {
            await using var context = new Context();

            await context.Database.EnsureDeletedAsync();
            await context.Database.EnsureCreatedAsync();

            var earliest = new DateTime(2020, 8, 15, 10, 0, 0);
            var fromTime = new DateTime(2020, 8, 15, 11, 0, 0);
            var timeUnit = TimeSpan.FromMinutes(5);

            var unitSeconds = (int)timeUnit.TotalSeconds;
            var evDict = await context.Logs
                .Where(e => e.EventTime <= fromTime && e.EventTime >= earliest)
                .GroupBy(e => (int)Math.Floor((decimal)EF.Functions.DateDiffSecond(earliest, e.EventTime) / unitSeconds))
                .Select(g => new
                {
                    Bucket = g.Key,
                    Events = g.Count()
                })
                .ToDictionaryAsync(e => e.Bucket, e => e.Events);

            Debug.Assert(evDict.Count == 12);
            Debug.Assert(evDict.All(kvp => kvp.Value == 5));
        }
    }
}

and Divide() is likely to be some custom extension method

Nope ;-) https://docs.microsoft.com/en-us/dotnet/api/system.timespan.divide?view=netcore-3.1

But thanks a lot, I will try it out!

Nope ;-) https://docs.microsoft.com/en-us/dotnet/api/system.timespan.divide?view=netcore-3.1

Ah, makes sense that it returns a TimeSpan and not a DateTime. We currently don't translate any TimeSpan methods in Pomelo (I think that is the same situation in other providers as well).

Technically this should be possible, but we would need to reevaluate our general strategy here, because we currently use TimeSpan to represent the TIME(n) store type, that is limited to -838 < x < +838 hours. So for member translation to make more general sense, we might also want to optionally (at least internally) map DATETIME(n) values to System.TimeSpan when needed (basically making System.TimeSpan interchangeable with System.DateTime).

This looks like a good general enhancement to EF Core, but would need a bit more thinking and testing.

Now tracked by #1147.

We should also add a list of translatable members to the wiki. I added #1146 to track this.

@mc0re I will assume that the provided query works, so I will go ahead and close this issue. If it does not work or there is anything else left, feel free to reopen it.

Was this page helpful?
0 / 5 - 0 ratings