Pandas: Pandas read_excel: only read first few lines

Created on 9 Jun 2017  路  5Comments  路  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

workbook_dataframe = pd.read_excel(workbook_filename, nrows = 10)

Problem description

Using pandas read_excel on about 100 excel files - some are large - I want to read the first few lines of each (header and first few rows of data).

The above doesn't work but illustrates the goal (example reading 10 data rows).

IO Excel good first issue

Most helpful comment

To get nrows without reading the entire worksheet:

workbook = pd.ExcelFile(workbook_filename)

# get the total number of rows (assuming you're dealing with the first sheet)
rows = workbook.book.sheet_by_index(0).nrows

# define how many rows to read
nrows = 10

# subtract the number of rows to read from the total number of rows (and another 1 for the header)
workbook_dataframe = pd.read_excel(workbook, skip_footer = (rows - nrows - 1))

All 5 comments

Sure, could support this, although xlrd (library we use to read Excel files) always reads the whole file into memory, so it wouldn't be as fast as you'd hope.

I'm looking into adding this functionality. Thanks.

You can add one line in your code below where you are reading your file. For example, if you want to read first 10 rows of the file then you can do this.

workbook_dataframe = pd.read_excel(workbook_filename)
workbook_dataframe =workbook_dataframe.iloc[:10]

or even you can simply do this
workbook_dataframe = pd.read_excel(workbook_filename).iloc[:10]

so that your data frame now contains only first 10 rows.

To get nrows without reading the entire worksheet:

workbook = pd.ExcelFile(workbook_filename)

# get the total number of rows (assuming you're dealing with the first sheet)
rows = workbook.book.sheet_by_index(0).nrows

# define how many rows to read
nrows = 10

# subtract the number of rows to read from the total number of rows (and another 1 for the header)
workbook_dataframe = pd.read_excel(workbook, skip_footer = (rows - nrows - 1))

Could I take a crack at this issue? I'm new to open source and would really like to start contributing. If the last line in @gmlander 's code is valid, I'd just have to identify where to place the code right? Thanks for any suggestions in advance!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andreas-thomik picture andreas-thomik  路  3Comments

nathanielatom picture nathanielatom  路  3Comments

songololo picture songololo  路  3Comments

MatzeB picture MatzeB  路  3Comments

ericdf picture ericdf  路  3Comments