Pandas: DataFrame.to_string truncates long strings

Created on 2 Apr 2015 · 23Comments · Source: pandas-dev/pandas

I am calling to_string() without any parameters and it beautifully fixed-formatted my dataframe apart from my very wide filename column, that is being truncated with "...". How can I avoid that?

                                            FILENAME  OBS_ID  XUV  
0  'mvn_iuv_l1a_IPH3-cycle00007-mode040-muvdark_2...      40  MUV  
1  'mvn_iuv_l1a_IPH2-cycle00047-mode050-muvdark_2...      50  MUV  
2  'mvn_iuv_l1a_apoapse-orbit00127-mode2001-muvda...    2001  MUV  
3  'mvn_iuv_l1a_APP1-orbit00087-mode1031-fuvdark_...    1031  FUV  
4  'mvn_iuv_l1a_IPH2-cycle00005-mode060-fuvdark_2...      60  FUV

I tried calling it like this, but to no avail (same output):

with open('test_summary_out.txt','w') as f:
    f.write(summarydf.head().to_string(formatters={'filename':lambda x: "{:100}".format(x)}))

Version: 0.16 with Python 3.4

Output-Formatting good first issue

Source

michaelaye

👍8

Most helpful comment

i would argue that the "to_string" method should be independent from a display setting for real-time analysis. A string object is not necessarily being used for display purposes.

michaelaye on 10 Aug 2017

👍9

All 23 comments

I think it picks that option up from display.max_colwidth. Does pd.set_option("display.max_colwidth", 10000) have an effect?

dsm054 on 2 Apr 2015

🚀3

yes, that solved it, thx. I guess it should not pick that up for a to_string() operation, as that is not display? Or, well it is, but maybe one needs a different to_textfile() method that avoids this to be picked up.

michaelaye on 2 Apr 2015

@dsm054 I think it might be worthwhile to point this out in here, maybe in a note box?

jreback on 2 Apr 2015

Is this issue resolved? I am getting this same issue on 0.20.3

JaysonSunshine on 10 Aug 2017

There was no changes, just further documentation.

gfyoung on 10 Aug 2017

Is the solution to modify the display settings? That seems pretty unsatisfactory.

JaysonSunshine on 10 Aug 2017

@jreback : Thoughts?

gfyoung on 10 Aug 2017

i would argue that the "to_string" method should be independent from a display setting for real-time analysis. A string object is not necessarily being used for display purposes.

michaelaye on 10 Aug 2017

👍9

In my present case, I am accessing a Redshift admin table to get a table DDL. The data frame has just that column/DDL, but I want to modify it in memory using a string operation -- split(';'). I think the to_string operation should definitely not carryover any display settings.

JaysonSunshine on 10 Aug 2017

Or, at least, to have a parameter we can toggle. That could work.

JaysonSunshine on 10 Aug 2017

Yes, I don't think the documentation addition really solved this issue.

The max_colwidth option is used by the DataFrameFormatter.to_string without being able to change it. At least we could add a keyword to be able to override it without needing to change the display settings.
But if you look at another option like display.max_rows or max_columns, those are ignored by to_string. So it even makes sense to ignore max_colwidth as well I think (anyhow, to be able to ignore the option, it will have to be added as a keyword anyhow, so the output formatting code can pass the correct setting).

jorisvandenbossche on 10 Aug 2017

👍4

So for me, PR welcome for this!

jorisvandenbossche on 10 Aug 2017

https://github.com/pandas-dev/pandas/issues/1852 is probably a duplicate of this

jorisvandenbossche on 10 Aug 2017

I can hardly see how the coupling of the display limit with any other processing helps in solving this benign scenario. And not really how padding the strings which I notice takes place as well, helps, outside the display scenario. If there's too much history behind it, would you recommend using the plain csv package of python, for reading strings without modifying them?

Here's a naive code sample, if it helps anyone ―

import csv
messages = []
with open("csv-file") as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',', quotechar='"')
    for row in reader:
        messages.append(row['message'])
messages

messages_df = pd.DataFrame(messages, columns=['message'])

# then concat to your main DataFrame...

This seems to avoid the truncation (but not the padding, which hurts a little with right-to-left text, as it pads as if the text is left-to-right, which kind of skews the semantics of the text more in the case of right-to-left text)

matanster on 19 Jan 2018

two days debugging to find out this was the issue. i'm sad...

Momut1 on 25 Mar 2019

see also #24841 for fix in to_html

simonjayhawkins on 1 Apr 2019

This was incredibly frustrating to debug. I was executing the code below and getting "..." in my output. I assumed it was just printing the "..." to the console, not _in_ the dataframe! While I'm sure this isn't a
"good" approach to wrapping text in a tag, it's the most obvious way when starting out. I'm sure _many_ people will do this and no one would expect this behavior.

df["ValueType"] = "<strong>" + df["ValueType"] == "Portfolio"] + "</strong>"

If this were my first experience with Pandas, I'd promptly throw it in the trash. Note: I _love_ Pandas and thank you everyone for amazing work you do! I just wanted to share my experience.

addahlin on 11 Apr 2019

yes, that solved it, thx. I guess it should not pick that up for a to_string() operation, as that is not display? Or, well it is, but maybe one needs a different to_textfile() method that avoids this to be picked up.

With Pandas 0.25.0, setting display.max_colwidth to a large number stops the truncation but when trying to left justify columns with df.to_string(justify='left'), that same display setting somehow pads columns on the left so they are not left aligned. Is there any present way to prevent truncation and get left justified string columns when output to a terminal? I know a pull request is in process but I would like to do this now. Thanks.

rswgnu on 29 Jul 2019

May I ask what the use case of having the to_string method dependent on the display.max_colwidth option? I can't seem to understand why one would ever ask for a DataFrame row as a string with truncated column values

yamen321 on 15 Aug 2019

👍1

@yamen321 I think it's agreed that to_string shouldn't truncate. Are you interested in working on it?

TomAugspurger on 15 Aug 2019

👍1

Hi! I jumped in on the "good first issue" label and put up a PR to solve this. Feedback very welcome.

lshepard on 21 Aug 2019

👍4

Thanks a lot for taking the initiative on this @lshepard!

yamen321 on 21 Aug 2019

Hey

I have a dataframe column with url file name like

0 http://address/filename1.jpg
1 http://address/filename2.jpg
Name: fileUrl, dtype: object

I want to extract the filenames from the url

from pathlib import Path
filenamelist = df.apply(lambda x: Path(x.to_string()).name if x.name == 'fileUrl' else x)

I just want the file

0 filename1.jpg
1 filename2.jpg

If the filename is long string my output looks like

filena...
filena...

df.fileUrl.max_colwidth = 100 not solving the issue

though using dataframe would be much faster than

select the column
iterate through the column elements
extract the name

Any work around here, instead of this?

filenames_list = [str(Path(x).name) for x in list(df['fileUrl'])]

santhoshnumberone on 21 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

can't plot multi-row subplots

ericdf · 3Comments

read_csv(filename_with_asian_locale) failed in python 3.6 for windows

mfmain · 3Comments

Cannot use apply on Series with Timestamp values

nathanielatom · 3Comments

df.duplicated and drop_duplicates raise TypeError with set and list values.

Abrosimov-a-a · 3Comments

frame _apply_standard error when operating on 0 or NaN values

venuktan · 3Comments