Elixir: Microsecond defaults

Created on 24 May 2016 · 25Comments · Source: elixir-lang/elixir

Shouldn't microsecond default to nil instead of 0? For Time, NaiveDateTime and DateTime.

Example: {10, 30, 22} |> Time.from_erl! |> Map.get(:microsecond)
Wouldn’t it make more sense if microsecond was nil instead of 0 here? We don’t know anything about microseconds from the data provided.

Another example: We parse some datetimes.

2015-01-23T23:50:07.123 microsecond: 123000
2015-01-23T23:50:07.0 microsecond: 0
2015-01-23T23:50:07 microsecond: ?

I propose that the third datetime should have “microsecond” be nil and not 0. Because we don’t know anything about microsecond precision.

With defaulting to 0 the 2nd and 3rd datetime would be identical and if we format the datetimes again the difference in precision would be lost. By defaulting to 0 we cannot reproduce the ISO 8601 datetime strings because we threw away the microsecond information provided. But if nil was allowed and the default, we could save that information.

Elixir Enhancement Advanced

Source

lau

Most helpful comment

It depends on what we mean with losing precision. If I tell someone I will meet them at 8 o clock. I don't mean anytime within the hour 8, I mean 8:00:00. For that reason having 0 as defaults for minute and second makes sense, and by extension also microsecond.

This also ties into the recent proposal of required struct fields, if we don't want to leave out any precision then all fields should be required instead of the option of having nil there.

If you actually need to specify the precision of a time then another data type should be used that can explicitly specify it.

ericmj on 25 May 2016

👍3

All 25 comments

I agree 100%.

josevalim on 24 May 2016

On one hand I can see how this is the correct solution.
On the other one it's extremely impractical. You'll get nil check everywhere around all the date handling code and it _will_ lead to bugs and issues.

michalmuskala on 25 May 2016

I agree with @michalmuskala, this feels impractical.
Honestly, I don't see where we may lose precision, it seems we're actually getting one. :bowtie:

lexmag on 25 May 2016

This also ties into the recent proposal of required struct fields, if we don't want to leave out any precision then all fields should be required instead of the option of having nil there.

If you actually need to specify the precision of a time then another data type should be used that can explicitly specify it.

ericmj on 25 May 2016

👍3

On the other one it's extremely impractical. You'll get nil check everywhere around all the date handling code and it will lead to bugs and issues.

What do you base that on? If you don't have nil microseconds you have bugs an issues to begin with: that you cannot represent a datetime that has no fractional seconds.

I have implemented nil microseconds in Calendar and it is not really that impractical. You don't need nil checks everywhere either, because in many places microseconds are ignored anyway. But when you use them, it is nice to have the correct data.

Yes, you have to check for nil if you want to e.g. print out a string representing the datetime with microseconds. But there is a good reason for that: you actually want to check if it is nil or not, so you know what to print!

If some libraries do not support nil, it is not the end of the world if they change nil to 0, but it would be nice if the standard library were more correct. Ie. certain libraries can still choose to use 0 as the default if they want for some reason, even though it would be better not to. But some times there is a reason to, for instance if it is a database library and the underlying database does not support nil microseconds.

You could say why not then make every field optional? That is a good point. But microsecond is different because in most cases a standard time has HH:MM:SS and fractional seconds are optional. You can see this in standards too. ISO8601 is like this. Seconds are needed, but fractional seconds are optional. By making microseconds nil, we also are better able at supporting standards. Because it reflects how fractional seconds are used.

lau on 25 May 2016

Quoting wikipedia:

A common convention in science and engineering is to express accuracy and/or precision implicitly by means of significant figures. Here, when not explicitly stated, the margin of error is understood to be one-half the value of the last significant place. For instance, a recording of 843.6 m, or 843.0 m, or 800.0 m would imply a margin of 0.05 m (the last significant place is the tenths place), while a recording of 8,436 m would imply a margin of error of 0.5 m (the last significant digits are the units).

This means 08:00:00 and 08:00:00.0 have different accuracies but so does 08:00:00.00. In other words, if we want to accurately store microseconds, an integer (or nil) field is not sufficient. So there is an inconsistency here since we are proposing to provide only _some_ accuracy (of precisely ±0.05s).

Similarly, I don't think the "meeting at 8 o'clock" example is relevant. Meetings are not scheduled with the precision of seconds yet there are many applications where those seconds are relevant. Such examples do not dictate anything about the accuracy of seconds nor microseconds.

You could say why not then make every field optional? That is a good point. But microsecond is different because in most cases a standard time has HH:MM:SS and fractional seconds are optional. You can see this in standards too. ISO8601 is like this. Seconds are needed, but fractional seconds are optional.

Both minutes and seconds are optional in ISO8601. See section "4.2.2.3 Representations with reduced accuracy" in ISO8601:2004.

josevalim on 25 May 2016

It depends on what we mean with losing precision. If I tell someone I will meet them at 8 o clock. I don't mean anytime within the hour 8, I mean 8:00:00. For that reason having 0 as defaults for minute and second makes sense, and by extension also microsecond.

The way it works with sigils you have to specify minute and second. But not microseconds. Because they are optional:

iex(1)> ~T[08]
** (ArgumentError) cannot parse "08" as time, reason: :invalid_format
    (elixir) lib/calendar.ex:450: Time.from_iso8601!/1
    (elixir) expanding macro: Kernel.sigil_T/2
             iex:1: (file)
iex(1)> ~T[08:00:00]
~T[08:00:00]
iex(2)> ~T[08:00:00.123456]
~T[08:00:00.123456]

It looks correct above, but the ~T[08:00:00] should really be displayed as ~T[08:00:00.0]. With default 0 microseconds we have are hidden with a because of the issue of default 0 microseconds.

So already today we have a special case in the code for 0 microseconds.

defp time_to_string(hour, minute, second, 0) do
    zero_pad(hour, 2) <> ":" <> zero_pad(minute, 2) <> ":" <> zero_pad(second, 2)
  end
  defp time_to_string(hour, minute, second, microsecond) do
    time_to_string(hour, minute, second, 0) <> "." <>
      (microsecond |> zero_pad(6) |> String.trim_trailing("0"))
  end

Instead of a special case for 0, we could have code for nil microseconds.

Both minutes and seconds are optional in ISO8601. See section "4.2.2.3 Representations with reduced accuracy" in ISO8601:2004.

You're right. RFC3339 requires seconds and has optional fractional seconds.

lau on 25 May 2016

The way it works with sigils you have to specify minute and second.

Yes, we adopted only part of the ISO. We can adopt more if we all think it makes sense (but let's leave that to a separate discussion).

It looks correct above, but the ~T[08:00:00] should really be displayed as ~T[08:00:00.0]. With default 0 microseconds we have are hidden with a because of the issue of default 0 microseconds.

And we could also display it as: ~T[08] if we wanted to! I am also not worried about the implementation. If nil is the correct thing to do, then it is the correct thing to do, and if it requires more code, so be it.

My point is that the proposal sounds arbitrary because we are still guaranteeing only +-0.05s accuracy. Today's mechanism may also be arbitrary (I will write more about it soon) and that is a problem because it is hard to pick between two arbitrary decisions. At the moment, the mechanism I have 100% confidence in it is to store microsecond as an integer field in a tuple along side the number of digits (from 0 to 6).

josevalim on 25 May 2016

On the other one it's extremely impractical. You'll get nil check everywhere around all the date handling code and it will lead to bugs and issues.

The kind of bugs you get by adding 0 by default are silent and generate no errors. The kind of bugs you get if you make some code that does not handle nil microseconds correctly raises errors, so you will notice the bug and can fix it.

lau on 25 May 2016

At the moment, the mechanism I have 100% confidence in it is to store microsecond as an integer field along side the number of digits (from 0 to 6).

To clarify:

iex> ~T[08:00:00].microseconds
{0, 0}

iex> ~T[08:00:00.0].microseconds
{0, 1}

iex> ~T[08:00:00.01].microseconds
{10000, 2}

iex> ~T[08:00:00.123456].microseconds
{123456, 6}

josevalim on 25 May 2016

One advantage of using the tuple is that you won't have to handle nils either. It is the same format everywhere. @ericmj, I would love your opinion (given the decimal library). /cc @bitwalker @lau

josevalim on 25 May 2016

It is true that some of the choices are somewhat arbitrary in a way. So are the design of standards. They are judgement calls. Some systems also have more choices of types than others. We could also have a separate type that has just a year and a month. And a separate type with just a year. And a separate type for datetimes with no seconds. And so on. Then you have a problem of having many types. On the other spectrum you can have just one type like in Javascript. That has other problems - problems that IMHO are worse than having 30 different types.

So it is about striking a balance. We have to make decisions about it. There isn't a rulebook that has all the answers.

The types we have now align pretty well with e.g. the types used in SQL databases. And how people use dates in societies. You use a date with year, month and date more often than just year plus month. And if you want to represent just a year, you could use just a integer for that.

With a date type that requires a second, you sometimes have to put a "fake" 0. But that is less misleading than e.g. using a datetime with timezone to represent a simple time. Because then you would not just fake a second, but fake an entire date and a timezone.

lau on 25 May 2016

To clarify:

iex> ~T[08:00:00].microseconds
{0, 0}

iex> ~T[08:00:00.0].microseconds
{0, 1}

iex> ~T[08:00:00.01].microseconds
{10000, 2}

iex> ~T[08:00:00.123456].microseconds
{123456, 6}

I like the idea.

Wouldn't {nil, 0} for the first example be more correct?

lau on 25 May 2016

@lau I don't see the benefit of introducing nil here. If you don't care about precision, you can always pick the first element, but if you do, you must look at the second. We get convenience without nil-checks and precision.

josevalim on 25 May 2016

I like the proposal with the tuple, but does the name microseconds make sense in that case? Microsecond is 1/1000000th of a second, and that's all. I think a more appropriate name would be fractional.

Another thing to consider is, if it's better to say {10, 2} or maybe {10, 100}.

michalmuskala on 25 May 2016

@michalmuskala that's a very good point, it may make sense to revisit the whole field name and simply call it "precision" or "fraction".

josevalim on 25 May 2016

Another name is "subsecond": https://en.wiktionary.org/wiki/subsecond

josevalim on 25 May 2016

👍1

👍 for "subsecond", I find it much more clear than "fraction" / "precision" / "fractional" :)

whatyouhide on 25 May 2016

We have decided to go with microsecond: {microsecond, precision} as it is by far the simplest to work with format. Thanks everyone for the discussion!

josevalim on 26 May 2016

Why is there the 6 digit microsecond rule? I have problems when serializing and deserializing a timestamp with 9 microsecond digits that comes from an external service. There is no error raises during parsing, but I can not inspect the struct as it results in an error printed. Further when I later try to serialize the value with ecto timex and poison I get an error.

kaelumania on 27 Jun 2017

6 digits are micro, 9 nano, 12 femto, 15 pico and so on, that's the reason
for having a 6 digit limit on micro...

Stephan E. notifications@github.com schrieb am Di., 27. Juni 2017, 20:41:

Why is there the 6 digit microsecond rule? I have problems when
serializing and deserializing a timestamp with 9 microsecond digits that
comes from an external service. There is no error raises during parsing,
but I can not inspect the struct as it results in an error printed. Further
when I later try to serialize the value with ecto timex and poison I get an
error.

—
You are receiving this because you are subscribed to this thread.

Reply to this email directly, view it on GitHub
https://github.com/elixir-lang/elixir/issues/4681#issuecomment-311448063,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADmR5Yf-rRyvVmTI_4g6jEH4wZ14cGRks5sIUy9gaJpZM4IlymX
.

NobbZ on 27 Jun 2017

@kaelumania Which function are you using to parse timestamps? More than 6 digits in the first part of the microsecond tuple is not valid.

lau on 27 Jun 2017

@lau I use Timex and TimexEcto, see https://github.com/bitwalker/timex/issues/318#issuecomment-311583693 for some context there.

The library returns a microsecond tuple with higher precision than 6, which breaks the inspect capability of DateTime, further it crashes whenever https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/calendar/iso.ex#L231 is called. I do understand that the specs state that only 6 digits are allowed - but maybe the following line of code should a bit more resilient like

binary_part(0, min(precision, 6))

Furthermore iso8601 has no restriction on the number of digits/precision. Why is there no support for nanoseconds etc. in the DateTime type?

kaelumania on 28 Jun 2017

@kaelumania the code should fail if you have invalid data and not silently discard it. I have improved this by making sure you can't create an invalid time through the Time.new interface. Timex will have to act on it anyway and make sure to trim the precision or error.

josevalim on 28 Jun 2017

@josevalim I agree. But the code should fail with a more descriptive error message. Maybe add a guard to the function definition that assumes precision <= 6.

Updated:

The docs also state that datetime strings with higher precision will be truncated. Thus, taking here the min(precision, 6) would mirror the same behavior. Still, what are the reason to not support nanoseconds etc.?

kaelumania on 28 Jun 2017

Was this page helpful?

0 / 5 - 0 ratings