Pandas: Bug on .to_string(index=False)

Created on 28 Jan 2019  路  15Comments  路  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

Related issue (very old though) #11833

import pandas

print("pandas.__version__ == {:s}".format(pandas.__version__))

columns = ['name', 'age']
values = [
    ['name_one', '31'],
    ['name_two', '32']]
data_frame = pandas.DataFrame(values, columns=columns)

# filter

filtered = data_frame[(data_frame['name'] == 'name_one') & (data_frame['age'] == '31')]
# single value comes with a blank at the beginning when running 0.24.0

print("==={:s}===".format(filtered.name.to_string(index=False)))

The output of this running on pandas 0.24.0 is:
(I put this in a file called pandasbug.py)

(goodman_pipeline) [simon@ctioy9 sandbox]$ python3.7 pandasbug.py 
pandas.__version__ == 0.24.0
=== name_one===

Problem description

Exactly when 0.24.0 was released my automatic builds started to fail. The problem, a file not found.
This was erroneously reported by me on astropy/ccdproc#658
I'm using a pandas data frame to filter a set of reference files and when extracting the file name and join it to a full path it turns out a non-existing path such as:
/full/path/to/[unwanted-blank-here]file_name.fits (/full/path/to/ file_name.fits)

This travis build will show the real-world error caused, if you can access it.

Expected Output

In my system I have pandas 0.20.3 working with python 3.6 and the latests working build ran pandas 0.23.4

(goodman_pipeline) [simon@ctioy9 sandbox]$ python3.6 pandasbug.py 
pandas.__version__ == 0.20.3
===name_one===

Output of pd.show_versions()

import pandas
pandas.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-1.el7.elrepo.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.2
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: None
sphinx: 1.8.3
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Bug Output-Formatting Regression Strings

Most helpful comment

Here's a simpler example. Agreed that the extra whitespace is not expected. Investigations and PR's welcome!

In [28]: pd.Series('a').to_string(index=False)
Out[28]: ' a'

In [29]: pd.__version__
Out[29]: '0.25.0.dev0+12.g2b16e2e6c.dirty'

All 15 comments

Here's a simpler example. Agreed that the extra whitespace is not expected. Investigations and PR's welcome!

In [28]: pd.Series('a').to_string(index=False)
Out[28]: ' a'

In [29]: pd.__version__
Out[29]: '0.25.0.dev0+12.g2b16e2e6c.dirty'

seems this is brought up on purpose in pandas/io/formats/format.py, default looks like is None:

if leading_space is False:
     # False specifically, so that the default is
     # to include a space if we get here.
     tpl = u'{v}'
else:
     tpl = u' {v}'

@charlesdong1991 that was added for a different reason (formatting ExtensionArrays IIRC).

We could maybe pass leading_space=self.leading_space in Seriesormatter._get_formatted_values in io/formats/format.py.

@TomAugspurger thanks for your reply... yes, i did try setting leading_space to False if self.indexis False in Seriesformatter._get_formatted_values, and this error did go away...

Pushing to 0.24.2

Any news on this? I'm still having the same issue on version 0.25.3

there is an open LR #29670 but i believe was quite tricky changing this - help welcome

Thanks for your reply. I was just checking it and it seems there is only one conflict. I will wait if someone already experimented fixes it or I will become an expert myself and try to help :)

Is anyone here still working on this?

Not really, I just fixed my version to one in which this does not exists. I wish I could fix it but I haven't had the chance.

@onshek Hi, feel free to work on it, there is a stale PR #29670 which you could reference as a starting point. #29670 i recall could solve the issue correctly, however, there are still several comments you need to address, the comments are in #25000 . Hope it helps you to start!

@onshek Hi, feel free to work on it, there is a stale PR #29670 which you could reference as a starting point. #29670 i recall could solve the issue correctly, however, there are still several comments you need to address, the comments are in #25000 . Hope it helps you to start!

Hi @charlesdong1991 , it seems in #29670 you have already solved the errors, so what's the problem at that time?

it was quite long ago, I cannot remember the details, I think the main concern back then was the solution in #29670 is kind of a hotfix, and might not be generalized.

There are some comments in #25000 and you can take a look @onshek A PR to solve it is certainly welcome!

it was quite long ago, I cannot remember the details, I think the main concern back then was the solution in #29670 is kind of a hotfix, and might not be generalized.

There are some comments in #25000 and you can take a look @onshek A PR to solve it is certainly welcome!

I've changed all leading_space = 'compat' to leading_space = True and this passed all tests (223 in total) modified and added in #29670. It seems there's no need to change leading_space: Optional[bool] = None to leading_space: Union[str, bool] = "compat". I'll make a PR after I take care of the rest codes in test_to_latex.py if you don't mind.

go-ahead for opening a PR, and make any changes you need. I don't mind at all ^^ @onshek

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Abrosimov-a-a picture Abrosimov-a-a  路  3Comments

hiiwave picture hiiwave  路  3Comments

andreas-thomik picture andreas-thomik  路  3Comments

Ashutosh-Srivastav picture Ashutosh-Srivastav  路  3Comments

scls19fr picture scls19fr  路  3Comments