Pandas: Need way to get original column order (read_csv, read_json)

Created on 27 Jan 2019  Â·  16Comments  Â·  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

# Your code here

Problem description

Applications sometimes need to be able to present and manipulate data in the column ordering found in the source data file (CSV or JSON). The column order in these files is significant to users in many cases. The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order. After the Dataframe has been created, there seems to be no way to get the original column order. Please consider adding a parameter to read_csv() and read_json() to preserve the column ordering found in the first line of the file (for CSV) or in the first record (for JSON). Thanks!

Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!

Note: Many problems can be resolved by simply upgrading pandas to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master addresses this issue, but that is not necessary.

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

Usage Question

All 16 comments

The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order.

Can you show an example where column order is lost with read_csv?

I just checked CSV and TXT – my bad – these work perfectly. The problem seems to only be with read_json(). Thanks!

From: Tom Augspurger notifications@github.com
Sent: Sunday, January 27, 2019 12:55 PM
To: pandas-dev/pandas pandas@noreply.github.com
Cc: Roland Fernandez rfernand@microsoft.com; Author author@noreply.github.com
Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)

The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order.

Can you show an example where column order is lost with read_csv?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpandas-dev%2Fpandas%2Fissues%2F24962%23issuecomment-457953371&data=02%7C01%7Crfernand%40microsoft.com%7Cca86b76525454514277908d68499bf41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636842193210153461&sdata=24K5o9dUafSUxF4j2G0Rf%2F8LwMgfbkzyR3mMEIVBs8s%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEGN3sayc5mr6VQiW93qrgzs1cY0IHjUks5vHhI2gaJpZM4aU630&data=02%7C01%7Crfernand%40microsoft.com%7Cca86b76525454514277908d68499bf41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636842193210163470&sdata=jxLwZp3eDSw8KCYt7lmOSBEfHFVtwx0C3E7idkuUW3s%3D&reserved=0.

How about an example with read_json? Keep in mind that dictionaries aren’t necessarily ordered.


From: Roland Fernandez notifications@github.com
Sent: Sunday, January 27, 2019 18:10
To: pandas-dev/pandas
Cc: Tom Augspurger; Comment
Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)

I just checked CSV and TXT – my bad – these work perfectly. The problem seems to only be with read_json(). Thanks!

From: Tom Augspurger notifications@github.com
Sent: Sunday, January 27, 2019 12:55 PM
To: pandas-dev/pandas pandas@noreply.github.com
Cc: Roland Fernandez rfernand@microsoft.com; Author author@noreply.github.com
Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)

The pd.read_csf() and pd.read_json() don't seem to have a parameter that prevents Pandas from sorting the columns and losing the original column order.

Can you show an example where column order is lost with read_csv?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpandas-dev%2Fpandas%2Fissues%2F24962%23issuecomment-457953371&data=02%7C01%7Crfernand%40microsoft.com%7Cca86b76525454514277908d68499bf41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636842193210153461&sdata=24K5o9dUafSUxF4j2G0Rf%2F8LwMgfbkzyR3mMEIVBs8s%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEGN3sayc5mr6VQiW93qrgzs1cY0IHjUks5vHhI2gaJpZM4aU630&data=02%7C01%7Crfernand%40microsoft.com%7Cca86b76525454514277908d68499bf41%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636842193210163470&sdata=jxLwZp3eDSw8KCYt7lmOSBEfHFVtwx0C3E7idkuUW3s%3D&reserved=0.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/24962#issuecomment-457967158, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQHIpwv68DkxwTzomTf94Q3r-rp6-Klks5vHj_8gaJpZM4aU630.

Sorry for the delay. Here is a failing test case:

`

pandasJsonLoader: test out ability of pandas JSON loader to preserve column order

import pandas

json_str = '[{"Name": "chevrolet chevelle malibu", "Miles_per_Gallon": 18, "Cylinders": 8,}]'
data = pandas.read_json(json_str)

print(list(data)) # out: ['Cylinders', 'Miles_per_Gallon', 'Name']
`

Not sure how I closed this issue - Reopening it again.

@rfernand2 what you are asking for is achievable with orient='table' for JSON as that by design produces a schema with an ordered list of field descriptors (columns)

Closing as I think this is already supported across formats requested

That doesn't seem to be a valid option for this. When I add "orient='table'" to the above test case, I get the following error:

C:\multiNLI\multinli_1.0>python test.py
Traceback (most recent call last):
File "test.py", line 4, in
data = pandas.read_json(json_str, orient="table")
File "c:\anaconda3\lib\site-packagespandas\io\json\json.py", line 422, in read_json
result = json_reader.read()
File "c:\anaconda3\lib\site-packagespandas\io\json\json.py", line 529, in read
obj = self._get_object_parser(self.data)
File "c:\anaconda3\lib\site-packagespandas\io\json\json.py", line 546, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "c:\anaconda3\lib\site-packagespandas\io\json\json.py", line 638, in parse
self._parse_no_numpy()
File "c:\anaconda3\lib\site-packagespandas\io\json\json.py", line 864, in _parse_no_numpy
precise_float=self.precise_float)
File "c:\anaconda3\lib\site-packagespandas\io\json\table_schema.py", line 298, in parse_table_schema
col_order = [field['name'] for field in table['schema']['fields']]
TypeError: list indices must be integers or slices, not str

Right your source data would need to be in the table format first.

pd.read_json(data[['Miles_per_Gallon', 'Cylinders', 'Name']].to_json(orient='table'), orient='table')

Hi William - thanks for the quick response.
So, if my data is in the format shown in my test case (which is was when I ran into this problem), there is no way to load it and preserve the column order using Pandas. Can you change to code to just preserve the column order for this common case? Maybe with a option for the behavior, if needed?
Thanks, Roland.

JSON objects are by definition unordered so I don't think there's a compliant way to get what you are asking for with your source data

https://json.org

OK, I see the problem now. Thanks for your help!

From: William Ayd notifications@github.com
Sent: Tuesday, March 26, 2019 9:02 AM
To: pandas-dev/pandas pandas@noreply.github.com
Cc: Roland Fernandez rfernand@microsoft.com; Mention mention@noreply.github.com
Subject: Re: [pandas-dev/pandas] Need way to get original column order (read_csv, read_json) (#24962)

JSON objects are by definition unordered so I don't think there's a compliant way to get what you are asking for with your source data

https://json.orghttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjson.org&data=02%7C01%7Crfernand%40microsoft.com%7C2c194201d05347bd67d808d6b20472b3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636892129505225052&sdata=qENLj0TeimCKpl7zZxJQNou7okZjYKh5pLn%2FiUpdhd4%3D&reserved=0

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpandas-dev%2Fpandas%2Fissues%2F24962%23issuecomment-476719735&data=02%7C01%7Crfernand%40microsoft.com%7C2c194201d05347bd67d808d6b20472b3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636892129505235061&sdata=ru%2BhAVEn5K%2BTkrN5chKLvUpOPEQGAh5I7UgCgs9xRiA%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAEGN3pwm1n_JVfweE88OTc_MQmlcujQlks5vakSUgaJpZM4aU630&data=02%7C01%7Crfernand%40microsoft.com%7C2c194201d05347bd67d808d6b20472b3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636892129505235061&sdata=dp9Xyux4IBPiBQ%2F5YQdxTUSesD9O6rqgs4ZynOcb2LA%3D&reserved=0.

Hi,

Please provide a feature where we can turn off lexicographical sorting of columns in read_json() Currently it sorts lexicographically regardless of the original order which leads to loosing of format.
By this I mean - the dataframe should be reading columns in order as they are placed in the original JSON structure. In case of an unformated JSON file it should throw an error that it cannot replicate the default column order for every other field.

Thanks.

# expected order
In [28]: qwe  = [{"question": "Who are you ?", "answer": "I am bot"}, {"question": "Are you production ready ?", "answer": "Definately not"},]

# expected order
In [29]: json.dumps(qwe)
Out[29]: '[{"question": "Who are you ?", "answer": "I am bot"}, {"question": "Are you production ready ?", "answer": "Definately not"}]'

# wrong order
In [30]: pandas.read_json(json.dumps(qwe))
Out[30]:
           answer                    question
0        I am bot               Who are you ?
1  Definately not  Are you production ready ?

In [31]: pandas.read_json(json.dumps(qwe)).to_csv()
Out[31]: ',answer,question\n0,I am bot,Who are you ?\n1,Definately not,Are you production ready ?\n'

I am going to use this function for converting json to csv, but mentioned function returns absolutely another object.
@WillAyd

JSON objects are by definition unordered

but it doesn't mean that read_json should change the object. So again, please reopen the issue and provide a key where we can turn off lexicographical sorting of columns:

for example

pandas.read_json(json.dumps(qwe), keep_order=True)

Same question here.
It seems that right now pandas.read_json would sort the JSON (as a dictionary in Python?) and then print it.
Still looking for an alternative to control the order because I would like to make certain column being the first order.
Providing the function for keeping order would be really helpful.

Thank you so much.

def json_to_csv(self, json_name):
    with open(json_name) as json_file:  
        json_data = json.load(json_file, object_pairs_hook=OrderedDict) 
    csv_name = 'temp.csv'
    with open(csv_name, 'w') as f:
        writer = csv.writer(f)
        keys = []
        for key in json_data[0]:
            keys.append(key)
        writer.writerow(keys)
        for j in json_data:
            values = []
            for key in j:
                values.append(j[key])
            writer.writerow(values)
        return pd.read_csv(csv_name) 

Just use the codes above for an alternative.
Load JSON as OrderedDict to maintain the original order and then convert it into a csv.
Use read_csv instead of read_json.

@jacky82226 's approach worked for me with minor adjustment.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jaradc picture jaradc  Â·  3Comments

matthiasroder picture matthiasroder  Â·  3Comments

amelio-vazquez-reina picture amelio-vazquez-reina  Â·  3Comments

hiiwave picture hiiwave  Â·  3Comments

andreas-thomik picture andreas-thomik  Â·  3Comments