Teammates: Instructor result download: xlsx file taking longer than csv file

Created on 24 Aug 2016  路  14Comments  路  Source: TEAMMATES/teammates

I have a case where csv file is downloaded almost immediately but the xlsx file time out.

a-Performance p.High

All 14 comments

@rclakmal any reason xlsx takes significantly longer than the csv file?

@damithc any more information on this specific case ?

There are some extra checks to apply different formatting for questions. Other than that, it is just distributing the file content to xlsx rows.

Here are some data. The 2nd of each pair is the csv file. As you can see, xlsx files takes way longer.
The top pair is for a session with 1 question and 20+ students.
The other two pairs are for a session with 80+ students and 8 questions.
image

Thanks @damithc ...Just to be clear, is this happening constantly for all sessions with high number of students.

I will have a look at the code to see on how we can improve slicing the rows.

20+ students is actually a small class. The difference in performance is applicable to both small and large classes. In large classes it can exceed 60 seconds and then a timeout.

For large files, Apache POI recommends a streaming API with regard to performance and memory. I'm trying out this to see whether there are any performance improvement.

Are the file sizes of two types different? Does the difference match the
difference in latency observed?

Sent from a mobile device.

If anything, the xslx file is smaller, so the file size is not the problem.
image
We may have to remove xlsx feature until we figure this out. Most people will choose the xslx option and it fails for mid size courses and large courses.

The two options using two different methods to retrieve data? Or are they using the same code at database level?

@damithc It's the same retrieval underneath. In xslx, we do a split of csv records to fill out the of the cell contents of excel file. This is also needed to have the cell level control for formatting. This split could be expensive for large number of rows.

Shall I remove the option from the UI until we figure out the issue ?

Yes, let's remove it from UI until we find a faster way to build the xlsx file. I suspect use of regex could be the culprit. regex can take long to match.

xls1

@damithc, @rclakmal, I guess some of our regexes are responsible here, but some Apache POI's methods seems a bit slow too.

Thx hong Jin. This shows performance bottlenecks can come from unlikely
places. :-p

Sent from a mobile device.

Thanks @kanghj ...the split with regex seems to be the main culprit though

Was this page helpful?
0 / 5 - 0 ratings