Using the following reader leads to omission of the dates resulting in no index:
file_, index_col=0,
header=4, sep=';', quoting=1, parse_dates=True,
encoding='latin-1', dayfirst=True,
# decimal=',', thousands='.'
dataframe with pyhon numeric data and datecol as index
pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-31-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None
Can you post some example data and the actual read_csv, and the output you're getting? You can use the StringIO module https://docs.python.org/3/library/io.html#io.StringIO to paste the data in. Thanks.
@TomAugspurger thanks for the response.
See also:
https://github.com/balzer82/DKB-Kontoauszug-Visualizer/blob/master/DKB-Kontoauszug-Visualizer.py#L35
Here goes the example:
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
TESTDATA=StringIO("""
;
;
;
;
;
;
day;transaction value (EUR) anon
04.08.2016;33043,5217026458
03.08.2016;52542,924322369
01.08.2016;2470,86389648009
01.08.2016;124462,590366738
01.08.2016;82283,2571462275
01.08.2016;5508278,98359258
01.08.2016;154264,547309073
01.08.2016;38068,8061252655
01.08.2016;10444,4806476402
01.08.2016;13958,4753522405
01.08.2016;4834682,43989406
29.07.2016;-1247752,81935378
29.07.2016;157975,468518703
""")
df = pd.read_csv(TESTDATA, index_col=0,
header=4, sep=';', quoting=1, parse_dates=True,
encoding='latin-1', dayfirst=True,
decimal=',',
thousands='.',
)
A couple of things:
1) from pandas.compat import StringIO takes care of the Python 2/3 thing FYI.
2) I believe you have the header parameter wrong. I think it should be header=6.
However, even with that parameter fixed, the reason you're seeing the index_col not being parsed as a date is because your thousands='.' is confusing the parser into parsing the '.' as a numeric value.
That is a bug, so thank you for pointing it out!
The issue actually has nothing to do with European date formats. You can see the bug surfaced here with a much more simplified example:
>>> from pandas import read_csv
>>> from pandas.compat import StringIO
>>>
>>> data = 'a\n04.15.2016'
>>> read_csv(StringIO(data), index_col=0, parse_dates=True, thousands='.')
Empty DataFrame
Columns: []
Index: [4152016] # WRONG
>>>
>>> read_csv(StringIO(data), index_col=0, parse_dates=True)
Empty DataFrame
Columns: []
Index: [2016-04-15 00:00:00] # RIGHT
Note that this bug does not affect non-index columns:
>>> read_csv(StringIO(data), parse_dates=['a'], thousands='.')
a
0 2016-04-15
Similar observations can be made with the Python parser.
@gfyoung thanks for confirmation.
minor clarification:
The issue actually has nothing to do with European date formats.
I was referring to _European data_ as under: Quoting, Compression, and File Format
No idea how to go on from here but looks like the processing priorities need to be changed in the parser.
@dacoex : Ah, okay. Good to know that my minimal example is capturing the issue you were seeing!
Big thanks to @jorisvandenbossche & @gfyoung FOSS is great!
BTW, to we need to add this to:
http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#v0-19-0-august-2016
it's takes a while to actually generate those docs
I tested with the v0.19 RC and it works with the original data. Thanks again!
Most helpful comment
I tested with the v0.19 RC and it works with the original data. Thanks again!