Pandas: DataFrame.to_string truncates long strings

Created on 2 Apr 2015  Â·  23Comments  Â·  Source: pandas-dev/pandas

I am calling to_string() without any parameters and it beautifully fixed-formatted my dataframe apart from my very wide filename column, that is being truncated with "...". How can I avoid that?

                                            FILENAME  OBS_ID  XUV  
0  'mvn_iuv_l1a_IPH3-cycle00007-mode040-muvdark_2...      40  MUV  
1  'mvn_iuv_l1a_IPH2-cycle00047-mode050-muvdark_2...      50  MUV  
2  'mvn_iuv_l1a_apoapse-orbit00127-mode2001-muvda...    2001  MUV  
3  'mvn_iuv_l1a_APP1-orbit00087-mode1031-fuvdark_...    1031  FUV  
4  'mvn_iuv_l1a_IPH2-cycle00005-mode060-fuvdark_2...      60  FUV  

I tried calling it like this, but to no avail (same output):

with open('test_summary_out.txt','w') as f:
    f.write(summarydf.head().to_string(formatters={'filename':lambda x: "{:100}".format(x)}))

Version: 0.16 with Python 3.4

Output-Formatting good first issue

Most helpful comment

i would argue that the "to_string" method should be independent from a display setting for real-time analysis. A string object is not necessarily being used for display purposes.

All 23 comments

I think it picks that option up from display.max_colwidth. Does pd.set_option("display.max_colwidth", 10000) have an effect?

yes, that solved it, thx. I guess it should not pick that up for a to_string() operation, as that is not display? Or, well it is, but maybe one needs a different to_textfile() method that avoids this to be picked up.

@dsm054 I think it might be worthwhile to point this out in here, maybe in a note box?

Is this issue resolved? I am getting this same issue on 0.20.3

There was no changes, just further documentation.

Is the solution to modify the display settings? That seems pretty unsatisfactory.

@jreback : Thoughts?

i would argue that the "to_string" method should be independent from a display setting for real-time analysis. A string object is not necessarily being used for display purposes.

In my present case, I am accessing a Redshift admin table to get a table DDL. The data frame has just that column/DDL, but I want to modify it in memory using a string operation -- split(';'). I think the to_string operation should definitely not carryover any display settings.

Or, at least, to have a parameter we can toggle. That could work.

Yes, I don't think the documentation addition really solved this issue.

The max_colwidth option is used by the DataFrameFormatter.to_string without being able to change it. At least we could add a keyword to be able to override it without needing to change the display settings.
But if you look at another option like display.max_rows or max_columns, those are ignored by to_string. So it even makes sense to ignore max_colwidth as well I think (anyhow, to be able to ignore the option, it will have to be added as a keyword anyhow, so the output formatting code can pass the correct setting).

So for me, PR welcome for this!

https://github.com/pandas-dev/pandas/issues/1852 is probably a duplicate of this

I can hardly see how the coupling of the display limit with any other processing helps in solving this benign scenario. And not really how padding the strings which I notice takes place as well, helps, outside the display scenario. If there's too much history behind it, would you recommend using the plain csv package of python, for reading strings without modifying them?

Here's a naive code sample, if it helps anyone ―

import csv
messages = []
with open("csv-file") as csvfile:
    reader = csv.DictReader(csvfile, delimiter=',', quotechar='"')
    for row in reader:
        messages.append(row['message'])
messages

messages_df = pd.DataFrame(messages, columns=['message'])

# then concat to your main DataFrame...

This seems to avoid the truncation (but not the padding, which hurts a little with right-to-left text, as it pads as if the text is left-to-right, which kind of skews the semantics of the text more in the case of right-to-left text)

two days debugging to find out this was the issue. i'm sad...

see also #24841 for fix in to_html

This was incredibly frustrating to debug. I was executing the code below and getting "..." in my output. I assumed it was just printing the "..." to the console, not _in_ the dataframe! While I'm sure this isn't a
"good" approach to wrapping text in a tag, it's the most obvious way when starting out. I'm sure _many_ people will do this and no one would expect this behavior.

df["ValueType"] = "<strong>" + df["ValueType"] == "Portfolio"] + "</strong>"

If this were my first experience with Pandas, I'd promptly throw it in the trash. Note: I _love_ Pandas and thank you everyone for amazing work you do! I just wanted to share my experience.

yes, that solved it, thx. I guess it should not pick that up for a to_string() operation, as that is not display? Or, well it is, but maybe one needs a different to_textfile() method that avoids this to be picked up.

With Pandas 0.25.0, setting display.max_colwidth to a large number stops the truncation but when trying to left justify columns with df.to_string(justify='left'), that same display setting somehow pads columns on the left so they are not left aligned. Is there any present way to prevent truncation and get left justified string columns when output to a terminal? I know a pull request is in process but I would like to do this now. Thanks.

May I ask what the use case of having the to_string method dependent on the display.max_colwidth option? I can't seem to understand why one would ever ask for a DataFrame row as a string with truncated column values

@yamen321 I think it's agreed that to_string shouldn't truncate. Are you interested in working on it?

Hi! I jumped in on the "good first issue" label and put up a PR to solve this. Feedback very welcome.

Thanks a lot for taking the initiative on this @lshepard!

Hey

I have a dataframe column with url file name like

0 http://address/filename1.jpg
1 http://address/filename2.jpg
Name: fileUrl, dtype: object

I want to extract the filenames from the url

so

from pathlib import Path
filenamelist = df.apply(lambda x: Path(x.to_string()).name if x.name == 'fileUrl' else x)

I just want the file

0 filename1.jpg
1 filename2.jpg

If the filename is long string my output looks like

filena...
filena...

df.fileUrl.max_colwidth = 100 not solving the issue

though using dataframe would be much faster than

select the column
iterate through the column elements
extract the name

Any work around here, instead of this?

filenames_list = [str(Path(x).name) for x in list(df['fileUrl'])]

Was this page helpful?
0 / 5 - 0 ratings