In [2]: pd.Series(pd.to_datetime(['2017-03-15'])).to_csv()
Out[2]: '0,2017-03-15\n'
In [3]: pd.Series(pd.to_datetime(['2017-03-15'])).to_json(date_format='iso')
Out[3]: '{"0":"2017-03-15T00:00:00.000Z"}'
From this SO comment
By default, to_csv()
drops times of day if they are all midnight. By default, to_json
instead writes dates in epoch, but if passed date_format='iso'
will always write times.
Not sure whether it would be optimal to have date_format='iso'
behave as to_csv()
, or rather add a new (for instance) date_format='iso_date'
, or maybe even free-form like accepted by pd.Series.df.strftime()
, e.g. date_format="%Y-%m-%d"
.
Out[3]: '{"0":"2017-03-15"}'
pd.show_versions()
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.7.0-1-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.utf8
LOCALE: it_IT.UTF-8
pandas: 0.20.1
pytest: 3.0.6
pip: 9.0.1
setuptools: None
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: 0.9.2
IPython: 5.1.0.dev
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.2
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.7.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.2.1
w_versions() here
Probably worth fixing together with #12997
-1 on this. This does not conform to JSON and this creates non-standard formatting for dates.
This does not conform to JSON
I'm not expert at all on JSON and/or date representation standards... but if JSON dictates the use of ISO 8601, then ISO 8601 does define a way to represent dates (without time). If JSON instead dictates using the full representation for ISO 8601, then indeed this is worth closing.
(I know that strictly speaking JSON does not dictate anything, I'm just referring to any widespread convention.)
I think accepting a generic date-format string would be reasonable and consistent with other APIs:
pd.Series(pd.to_datetime(['2017-03-15'])).to_csv(date_format='%m/%d/%Y')
Out[30]: '0,03/15/2017\n'
There is no "JSON standard" on what a date should look like, major languages vastly differ on how they serialize their builtin datetimes to JSON with standard libraries. Most of the time the point is to hand to a JS library what it expects for a date format. Sometimes this looks like %Y%m%d as a string for just a date... Being pragmatic is far more useful than having to completely reimplement JSON serialisation just to have a different date format, when there is a parameter called date_format. My expectation coming to this for the first time was something like datetime.strftime(), which is what I imagine a lot of people would expect. Perhaps add in strftime as a new parameter for back compat? The default format should just be ISO too really, rather than a raw timestamp since you've lost TZ info that might be in there anyway. So the existing default is kind of bad, and the added format functionality doesn't do a huge deal for you other than provide the sensible default.
FWIW in the table schema the date field descriptor accepts a pattern
https://frictionlessdata.io/specs/table-schema/https://frictionlessdata.io/specs/table-schema/
So if we wanted to specify a date format along with the provided JSON that may be the best (i.e. only) option
Any movement on this? It's a real problem when generating JSONL files for Redshift. Redshift rejects the file because it has a time portion for a field that's a DATE
. (I don't own the table so switching to TIMESTAMP
isn't a viable solution.)
Not on this specific issue. @WillAyd is working on a refactor of our JSON
IO code that should make things easier though.
On Tue, Aug 13, 2019 at 10:43 AM Diego Argueta notifications@github.com
wrote:
Any movement on this? It's a real problem when generating JSONL files for
Redshift. Redshift rejects the file because it has a time portion for a
field that's a DATE. (I don't own the table so switching to TIMESTAMP
isn't a viable solution.)—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/16492?email_source=notifications&email_token=AAKAOIWAOQNLCFI3BRL7BB3QELJANA5CNFSM4DMYREB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4GCO3I#issuecomment-520890221,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAKAOITXE6CWNTGCTXBWOMTQELJANANCNFSM4DMYREBQ
.
I just want to second the request to enable a date serializer that allows dates as 'YYYY-MM-DD' in the to_json() method not just the ISO method with the time also. I have other services that are rejecting the date on this basis.
In case someone else is struggling with this and needs a quick workaround there is a messy hack converting to a dictionary and then using json.dumps() with a serialiser for the date format you want. In case this helps anyone here is a small function as a workaround:
import json
def get_json(df):
""" Small function to serialise DataFrame dates as 'YYYY-MM-DD' in JSON """
def convert_timestamp(item_date_object):
if isinstance(item_date_object, (datetime.date, datetime.datetime)):
return item_date_object.strftime("%Y-%m-%d")
dict_ = df.to_dict(orient='records')
return json.dumps(dict_, default=convert_timestamp)
my_json = get_json(df)
I found a great solution here
df.assign(
**df.select_dtypes(['datetime']).astype(str).to_dict('list')
).to_json(orient="records")
Most helpful comment
Any movement on this? It's a real problem when generating JSONL files for Redshift. Redshift rejects the file because it has a time portion for a field that's a
DATE
. (I don't own the table so switching toTIMESTAMP
isn't a viable solution.)