Elixir: NaiveDateTime.to_gregorian_seconds/1 and friends ignore microseconds

Created on 25 May 2020  路  34Comments  路  Source: elixir-lang/elixir

Per https://github.com/elixir-lang/elixir/issues/9831, linear representations are supposed to be used in sorting, etc. However, we currently ignore microseconds:

iex> NaiveDateTime.to_gregorian_seconds(~N[2020-01-01 09:00:00])
63745088400
iex> NaiveDateTime.to_gregorian_seconds(~N[2020-01-01 09:00:00.999999])
63745088400

which makes it inappropriate for sorting in certain circumstances. We could solve it by returning floats or tuples.

Elixir Bug

Most helpful comment

Hi folks, after thinking more about this, let's go with the tuple format, because it is easy to discard microseconds and it works best when you want to need to manipulate the microseconds too. A PR is welcome!

All 34 comments

Would gladly tackle this one if if no one already started it. Which one you prefer? I particularly like the tuple idea. I'm just not sure about backwards compatibility, should we add an opt-in option for it?

there's no backwards compatibility to worry about as it's a new function on master!

I am still torn between floats and tuples, so maybe we need some discussion prior. /cc @kipcole9 @Qqwy

Given that a linear representation has a reasonable likelihood of being persisted I imagine floats offer more compatibility. If I recall an earlier conversation with @Qqwy then I think precision is not an issue with microseconds? If using a float means that there is a risk that two different (perhaps adjacent) date times could resolve to the same linear value that would be a problem.

I like the shape of a tuple for this from an Elixir point of view, but if precision is not an issue and given persistence is a use case then floats may be more appropriate.

I think precision is not an issue with microseconds?

I don't think it is going to be an issue but I guess it sends the wrong message. Another option though is to use an integer but with microsecond precision (as if it was called to_gregorian_microseconds - but then the naming can be confusing). Maybe returning a tuple is the best option.

Another option though is to use an integer but with microsecond precision (as if it was called to_gregorian_microseconds - but then the naming can be confusing)

So renaming the function is not an option then?

It is ok to change it if we have a good proposal.

What about having same convention with DateTime.to_unix(datetime, unit \\ :second)?

  • Always return integer
  • Extendable with unit parameter

That's what I proposed above, but we need to find a better name, especially because I think the default here should be microseconds.

How about DateTime.to_linear_representation?

@spec to_linear_representation(System.tim_unit()) :: integer()
def to_linear_representation(unit \\ :microsecond) ...

An option could be to simply call it to_integer. It doesn't convey much semantic meaning, but describes what the function does pretty clearly

It could also be to_gregorian or to_gregorian_unit. We also need similar changes to Time.to_seconds_after_midnight. Due to all of this, I am thinking a tuple may be the best way to go. And it is easy to discard microseconds if you are not interested in them.

I think that returning a tuple makes it more explicit to the consumer of the code that there are different precisions to care about, making mistakes like 'thinking you have seconds but actually having microseconds' (near) impossible.

I鈥檓 with @michalmuskala, maybe having an optional unit param: to_integer(unit // :microseconds)

If the main purpose of this function is to return easily comparable value, I think integer with sane default (e.g. always return microsecond) would be ideal since it is already comparable without manual comparison function.

Somewhat related:

  • erlang/elixir already use microseconds for duration when it makes sense (e.g. Process.sleep/1)
  • System.monotonic_time is another example: we don't care how it is generated or transformed from clocks. We just need to know 1) the characteristics of the value and 2) it's comparable.

If it's just to return comparable value with Gregorian calendar, then what about NaiveDateTime.to_gregorian_unix(naive_datetime, unit \\ :second) or even NaiveDateTime.to_unix(naive_datetime, unit \\ :second, transform \\ :gregorian)?

If it returns tuple... I think it's just better version of NaiveDateTime.to_erl/1 (which holds only down to seconds)

If it returns tuple... I think it's just better version of NaiveDateTime.to_erl/1 (which holds only down to seconds)

True, maybe if we go with a tuple, we could add an optional precision param to to_erl, so if one gives :microsecond we add it to the end of the time tuple (keeping the default to :second just for backwards compatibility).

As to_erl already returns a tuple like {{2000, 2, 29}, {23, 00, 07}}, I think we could use it to comparing and sorting, and if we take into account @Qqwy's point of being explicit that there is multiple precisions to care, I think this format would be most explicit we can be.

I just still think that to_integer would be a nice addition, as people can use it for simple arithmetic time operations too, like NaiveDateTime.to_integer(d2, :second) - NaiveDateTime.to_integer(d1, :second)

Perhaps worth taking a step back and discuss use cases for these new functions i.e. what's the actual code we'd use them over things like Enum.sort(..., NaiveDateTime)?

Enum.sort is slow because you do the comparison one by one. For sorting dates, this is actually much faster: Enum.sort_by(dates, &NaiveDateTime.to_integer_or_whatever_we_call_it/1). The second representation can also be used by drivers like Postgrex, which today still has to resort to :calendar to get a second representation. If we look at Postgrex, the tuple would be the best format too.

Hi folks, after thinking more about this, let's go with the tuple format, because it is easy to discard microseconds and it works best when you want to need to manipulate the microseconds too. A PR is welcome!

Alright, what about the name of the function? And the format of the tuple? Should it be {{year, month, day}, {hour, minute, second, microsecond}}?

The same names as we are still returning seconds. We will just change the return type to be {seconds, microseconds}.

Alright then, will open the PR soon.

What about from_gregorian_seconds? Should we update it to accept the new tuple format, so that the property naive_date_time |> NaiveDateTime.to_gregorian_seconds() |> NaiveDateTime.from_gregorian_seconds() == naive_date_time is always true?

I mean, I know there is other params for that, but it would be strange to have to do this to get the same behaviour:

{sec, msec} = NaiveDateTime.to_gregorian_seconds(ndt)
NaiveDateTime.from_gregorian_seconds(sec, {msec, 6})

My suggestion is to change @spec from_gregorian_seconds(integer(), Calendar.microsecond(), Calendar.calendar()) :: t to @spec from_gregorian_seconds({integer(), integer()}, Calendar.calendar()) :: t and ask for full precision microseconds on the second elem of the tuple (the same way to_gregorian_seconds will return it)

I wouldn't change from_gregorian_seconds for now since, as you noted, the precision and other values may be required when creating it. Unless we were to also return the precision in to_gregorian_seconds, but I think that would conflict with its purpose of ordering.

PS.: just realised that the property naive_date_time |> NaiveDateTime.to_gregorian_seconds() |> NaiveDateTime.from_gregorian_seconds() == naive_date_time will be false when naive_date_time = ~N[2020-01-01 00:00:00.1], as the precision would be lost and the return of from_gregorian_seconds would be ~N[2020-01-01 00:00:00.100000]

Alrighty 馃憤

Unless we were to also return the precision in to_gregorian_seconds, but I think that would conflict with its purpose of ordering.

Well, we could return it as {seconds, {microsecond, precision}}, it would still work for ordering, as microsecond do not change depending on the precision:

iex(4)> ~N[2020-01-01 01:01:01.1].microsecond
{100000, 1}
iex(5)> ~N[2020-01-01 01:01:01.10].microsecond
{100000, 2}
iex(6)> ~N[2020-01-01 01:01:01.100000].microsecond
{100000, 6}

It would not work well for actual time comparisons though, like NaiveDateTime.to_gregorian_seconds(~N[2020-01-01 01:01:01.1]) == NaiveDateTime.to_gregorian_seconds(~N[2020-01-01 01:01:01.10]) would return false even being exactly the same second and microsecond. I don't think that we actually should care about microseconds precision on comparisons, but I'm also not sure if to_gregorian_seconds is meant to be used for comparisons too.

Anyway, PR is open. Tell me if you need any change in there.

The issue is that it messes up stable sorting. If we return the precision, Enum.sort_by(dates, &NaiveDateTime.to_gregorian_seconds/1) and Enum.sort(dates, NaiveDateTime) would return different results.

True, alright, the PR is returning {seconds, microseconds} anyway, so no change needed. 馃槃

Closing in favor of PR.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cmeiklejohn picture cmeiklejohn  路  3Comments

josevalim picture josevalim  路  3Comments

DEvil0000 picture DEvil0000  路  3Comments

Irio picture Irio  路  3Comments

GianFF picture GianFF  路  3Comments