Elixir: NaiveDateTime.to_gregorian_seconds/1 and friends ignore microseconds

Created on 25 May 2020 · 34Comments · Source: elixir-lang/elixir

Per https://github.com/elixir-lang/elixir/issues/9831, linear representations are supposed to be used in sorting, etc. However, we currently ignore microseconds:

iex> NaiveDateTime.to_gregorian_seconds(~N[2020-01-01 09:00:00])
63745088400
iex> NaiveDateTime.to_gregorian_seconds(~N[2020-01-01 09:00:00.999999])
63745088400

which makes it inappropriate for sorting in certain circumstances. We could solve it by returning floats or tuples.

Elixir Bug

Source

wojtekmach

Most helpful comment

Hi folks, after thinking more about this, let's go with the tuple format, because it is easy to discard microseconds and it works best when you want to need to manipulate the microseconds too. A PR is welcome!

josevalim on 5 Jun 2020

❤3

All 34 comments

Would gladly tackle this one if if no one already started it. Which one you prefer? I particularly like the tuple idea. I'm just not sure about backwards compatibility, should we add an opt-in option for it?

kelvinst on 29 May 2020

there's no backwards compatibility to worry about as it's a new function on master!

wojtekmach on 29 May 2020

I am still torn between floats and tuples, so maybe we need some discussion prior. /cc @kipcole9 @Qqwy

josevalim on 29 May 2020

Given that a linear representation has a reasonable likelihood of being persisted I imagine floats offer more compatibility. If I recall an earlier conversation with @Qqwy then I think precision is not an issue with microseconds? If using a float means that there is a risk that two different (perhaps adjacent) date times could resolve to the same linear value that would be a problem.

I like the shape of a tuple for this from an Elixir point of view, but if precision is not an issue and given persistence is a use case then floats may be more appropriate.

kipcole9 on 29 May 2020

I think precision is not an issue with microseconds?

I don't think it is going to be an issue but I guess it sends the wrong message. Another option though is to use an integer but with microsecond precision (as if it was called to_gregorian_microseconds - but then the naming can be confusing). Maybe returning a tuple is the best option.

josevalim on 29 May 2020

Another option though is to use an integer but with microsecond precision (as if it was called to_gregorian_microseconds - but then the naming can be confusing)

So renaming the function is not an option then?

kelvinst on 29 May 2020

It is ok to change it if we have a good proposal.

josevalim on 29 May 2020

What about having same convention with DateTime.to_unix(datetime, unit \\ :second)?

Always return integer
Extendable with unit parameter

chulkilee on 30 May 2020

That's what I proposed above, but we need to find a better name, especially because I think the default here should be microseconds.

josevalim on 30 May 2020

How about DateTime.to_linear_representation?

@spec to_linear_representation(System.tim_unit()) :: integer()
def to_linear_representation(unit \\ :microsecond) ...

NickNeck on 30 May 2020

An option could be to simply call it to_integer. It doesn't convey much semantic meaning, but describes what the function does pretty clearly

michalmuskala on 30 May 2020

It could also be to_gregorian or to_gregorian_unit. We also need similar changes to Time.to_seconds_after_midnight. Due to all of this, I am thinking a tuple may be the best way to go. And it is easy to discard microseconds if you are not interested in them.

josevalim on 30 May 2020

I think that returning a tuple makes it more explicit to the consumer of the code that there are different precisions to care about, making mistakes like 'thinking you have seconds but actually having microseconds' (near) impossible.

Qqwy on 30 May 2020

I’m with @michalmuskala, maybe having an optional unit param: to_integer(unit // :microseconds)

kelvinst on 30 May 2020

If the main purpose of this function is to return easily comparable value, I think integer with sane default (e.g. always return microsecond) would be ideal since it is already comparable without manual comparison function.

Somewhat related:

erlang/elixir already use microseconds for duration when it makes sense (e.g. Process.sleep/1)
System.monotonic_time is another example: we don't care how it is generated or transformed from clocks. We just need to know 1) the characteristics of the value and 2) it's comparable.

If it's just to return comparable value with Gregorian calendar, then what about NaiveDateTime.to_gregorian_unix(naive_datetime, unit \\ :second) or even NaiveDateTime.to_unix(naive_datetime, unit \\ :second, transform \\ :gregorian)?

If it returns tuple... I think it's just better version of NaiveDateTime.to_erl/1 (which holds only down to seconds)

chulkilee on 1 Jun 2020

If it returns tuple... I think it's just better version of NaiveDateTime.to_erl/1 (which holds only down to seconds)

True, maybe if we go with a tuple, we could add an optional precision param to to_erl, so if one gives :microsecond we add it to the end of the time tuple (keeping the default to :second just for backwards compatibility).

As to_erl already returns a tuple like {{2000, 2, 29}, {23, 00, 07}}, I think we could use it to comparing and sorting, and if we take into account @Qqwy's point of being explicit that there is multiple precisions to care, I think this format would be most explicit we can be.

I just still think that to_integer would be a nice addition, as people can use it for simple arithmetic time operations too, like NaiveDateTime.to_integer(d2, :second) - NaiveDateTime.to_integer(d1, :second)

kelvinst on 1 Jun 2020

Perhaps worth taking a step back and discuss use cases for these new functions i.e. what's the actual code we'd use them over things like Enum.sort(..., NaiveDateTime)?

wojtekmach on 1 Jun 2020

👍2

Enum.sort is slow because you do the comparison one by one. For sorting dates, this is actually much faster: Enum.sort_by(dates, &NaiveDateTime.to_integer_or_whatever_we_call_it/1). The second representation can also be used by drivers like Postgrex, which today still has to resort to :calendar to get a second representation. If we look at Postgrex, the tuple would be the best format too.

josevalim on 1 Jun 2020

josevalim on 5 Jun 2020

❤3

Alright, what about the name of the function? And the format of the tuple? Should it be {{year, month, day}, {hour, minute, second, microsecond}}?

kelvinst on 5 Jun 2020

The same names as we are still returning seconds. We will just change the return type to be {seconds, microseconds}.

josevalim on 5 Jun 2020

Alright then, will open the PR soon.

kelvinst on 5 Jun 2020

What about from_gregorian_seconds? Should we update it to accept the new tuple format, so that the property naive_date_time |> NaiveDateTime.to_gregorian_seconds() |> NaiveDateTime.from_gregorian_seconds() == naive_date_time is always true?

kelvinst on 5 Jun 2020

I mean, I know there is other params for that, but it would be strange to have to do this to get the same behaviour:

{sec, msec} = NaiveDateTime.to_gregorian_seconds(ndt)
NaiveDateTime.from_gregorian_seconds(sec, {msec, 6})

kelvinst on 5 Jun 2020

My suggestion is to change @spec from_gregorian_seconds(integer(), Calendar.microsecond(), Calendar.calendar()) :: t to @spec from_gregorian_seconds({integer(), integer()}, Calendar.calendar()) :: t and ask for full precision microseconds on the second elem of the tuple (the same way to_gregorian_seconds will return it)

kelvinst on 5 Jun 2020

I wouldn't change from_gregorian_seconds for now since, as you noted, the precision and other values may be required when creating it. Unless we were to also return the precision in to_gregorian_seconds, but I think that would conflict with its purpose of ordering.

josevalim on 5 Jun 2020

PS.: just realised that the property naive_date_time |> NaiveDateTime.to_gregorian_seconds() |> NaiveDateTime.from_gregorian_seconds() == naive_date_time will be false when naive_date_time = ~N[2020-01-01 00:00:00.1], as the precision would be lost and the return of from_gregorian_seconds would be ~N[2020-01-01 00:00:00.100000]

kelvinst on 5 Jun 2020

Alrighty 👍

kelvinst on 5 Jun 2020

Unless we were to also return the precision in to_gregorian_seconds, but I think that would conflict with its purpose of ordering.

Well, we could return it as {seconds, {microsecond, precision}}, it would still work for ordering, as microsecond do not change depending on the precision:

iex(4)> ~N[2020-01-01 01:01:01.1].microsecond
{100000, 1}
iex(5)> ~N[2020-01-01 01:01:01.10].microsecond
{100000, 2}
iex(6)> ~N[2020-01-01 01:01:01.100000].microsecond
{100000, 6}

kelvinst on 5 Jun 2020

It would not work well for actual time comparisons though, like NaiveDateTime.to_gregorian_seconds(~N[2020-01-01 01:01:01.1]) == NaiveDateTime.to_gregorian_seconds(~N[2020-01-01 01:01:01.10]) would return false even being exactly the same second and microsecond. I don't think that we actually should care about microseconds precision on comparisons, but I'm also not sure if to_gregorian_seconds is meant to be used for comparisons too.

kelvinst on 5 Jun 2020

Anyway, PR is open. Tell me if you need any change in there.

kelvinst on 5 Jun 2020

The issue is that it messes up stable sorting. If we return the precision, Enum.sort_by(dates, &NaiveDateTime.to_gregorian_seconds/1) and Enum.sort(dates, NaiveDateTime) would return different results.

josevalim on 5 Jun 2020

True, alright, the PR is returning {seconds, microseconds} anyway, so no change needed. 😄

kelvinst on 5 Jun 2020

Closing in favor of PR.

josevalim on 7 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings