Julia: [Dates] Extending the parsing/formatting machinery is awkward

Created on 24 Sep 2018  路  11Comments  路  Source: JuliaLang/julia

Extending the parsing machinery in Dates requires one to modify dictionaries such as CONVERSION_SPECIFIERS and to extend methods such as default_format.

Updating the dicts needs to happen within __init__, see e.g. https://github.com/JuliaTime/TimeZones.jl/issues/24

EDIT: Since default_format might need the updated dicts, it needs to be extended in __init__ as well. Thus requiring eval which then leads to https://github.com/JuliaLang/julia/issues/29059 (can I just ignore this if it works?) @eval is not needed but a precompiled DateFormat cannot be used.

All in all, the process is awkward and the end result not very pleasing, see here https://github.com/JuliaAstro/AstroTime.jl/blob/04a6ae917b9277e2abf77dfb199e487885db9595/src/AstroTime.jl#L22

Could this not be implemented through multiple dispatch alone? If not, what am I missing?

dates

Most helpful comment

I agree day-of-year should upstreamed. I just found the issue for it: https://github.com/JuliaLang/julia/issues/21905.

Using strptime character codes sounds reasonable to me. I believe there have been some proposals for formatted string printing and we'll probably want to have the dates formatting syntax be consistent.

All 11 comments

When we did the parsing performance overhaul for Julia 0.6 we needed to use generated functions to address the performance issues. A side result of that is we needed to use dictionaries to still allow extensibility for packages like TimeZones. I'm not sure these restrictions are still the case with Julia 1.0.

Good to know! I plan to check whether it is still needed sometime this week.

Are there lots of potential extensions needed to date parsing, or is TimeZones the only example? If possible, it would be better for Dates to already know about all needed format characters, and handle them with 0-method functions.Then TimeZones.jl can add methods to that function when it's loaded.

TimeZones is the only example I know of. It seems sensible to me to reserve the z and Z format characters.

The other example is AstroTime.jl. It uses D for the day-of-year format, e.g. AstroTime.format(now(), "yyyy-DDDTHH:MM:SS.sss") == "2019-45T08:19:53.529", and t for the time scale, e.g. AstroTime.format(now(), "yyyy-mm-dd HH:MM t") == "2019-02-14 08:21 UTC". The former should probably be upstreamed.

Just for my understanding: Would it not make sense to add a prefix to the character codes, e.g. strptime-style %d (apart from it being a breaking change)? This would make it easier to parse timestamps with additional text (see here) without preprocessing and the whole alphabet could be made available for future extension.

I agree day-of-year should upstreamed. I just found the issue for it: https://github.com/JuliaLang/julia/issues/21905.

Using strptime character codes sounds reasonable to me. I believe there have been some proposals for formatted string printing and we'll probably want to have the dates formatting syntax be consistent.

I recently discovered that the Unicode Technical Standard #35 contains a specification for date formatting and parsing symbols which works similarly to Julia's DateFormat.

Some particular things to note about this specification:

  • A..Z and a..z are reserved as pattern characters (ref)
  • Text between single vertical quotes ('xxxx'), which may include A..Z and a..z as literal text. ((ref). e.g. dateformat"'SMAP_L4_SM_gph_'yyyymmdd'T'HHMMSS" vs. dateformat"\S\MAP_L4_\S\M_gph_yyyymmddTHHMMSS"
  • Day of year is included in the specification as D
  • The S field pattern specifies fractional seconds and not milliseconds which is helpful for showing additional precision

Unfortunately the specification has some incompatibilities with what is currently implemented in Dates. Time willing I'll try attempting the fully unicode specification as a separate package to try it out.

I have the same issue as in #21905. Ordinal (day of year) formatting is specified in ISO 8601 and commonly implemented in scientific/industrial datalogging equipment. I often need to parse data with dates specified in YYYY-DDD format (e.g. today would be 2020-079).

I don't have a use for any other functionality from AstroTime.jl, and there doesn't seem to be an elegant way to use its format parser to generate a regular Date that makes this job any simpler. My other options all seem sub-optimal and generally hack'y, like implementing a generic function:

OrdinalDate(year, doy) = Date( firstdayofyear(Date(year)) + Day(doy-1) )

Being able to directly construct an ordinal date, e.g. Date(year::Int, doy::Int), would be great but I don't know how we could make that unambiguous from the existing Date(year::Int, month::Int) signature.

Given the context of extracting data from logs, adding a symbol to DateFormat that enables calls like Date("2020079", DateFormat("yyyyD")) would be pretty much ideal.

@mikeingold For the time being, you could do this, which is not too inelegant IMHO:

using Dates
import AstroTime
Date(DateTime(AstroTime.UTCEpoch("2020079", DateFormat("yyyyD"))))

But I agree that having this in the stdlib would be better.

This just came up on Discourse: https://discourse.julialang.org/t/parsing-high-precision-timestamps/44061/1

TL;DR: All types using the built-in parser all limited to millisecond precision even Time.

Another example is MonthlyDates, which uses q to parse quarters (i.e. 2020-Q3), see https://github.com/matthieugomez/MonthlyDates.jl/pull/7

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tkoolen picture tkoolen  路  3Comments

Keno picture Keno  路  3Comments

manor picture manor  路  3Comments

yurivish picture yurivish  路  3Comments

m-j-w picture m-j-w  路  3Comments