I have an Excel file with over 800,000 records that need to be imported.
Am using shouldQueue, batchInserts of 1000 and chunkReading of 1000.
Import process seems to occupy more and more memory with each queue until i run out of memory. I have a 40GB RAM server
Expected behavior:
That after each import queue is done, occupied memory is released so that other queues can use it.
Actual behavior:
It seems that with each queue, more and more memory is consumed until the whole server shuts down.
Any additional information, configuration or data that might be necessary to reproduce the issue.
I tried issuing the queue:restart command every time i see the memory spike, and it immediately released a lot of memory and the whole file can be uploaded successfully.
This is just a temporary work around am using.
Possibly related to https://github.com/Maatwebsite/Laravel-Excel/issues/1391 but the solution provided seems to be for an older version of the package.
Hey @MakamuEvans would you be willing to create a repo with a test setup that replicates this problem so I can have a look around? Would makes this easier to debug, thanks!
Hi @patrickbrouwers
I have a similar problem, when trying to import a CSV of about 80000 lines. In my case it happens when it is combined with the ShouldQueue Contracts
This works:
abstract class BaseImport implements ToModel, WithCustomCsvSettings, WithBatchInserts, WithChunkReading, WithEvents
{
聽聽聽聽 use Importable, RegistersEventListeners;
聽聽聽聽 ....
This does not work:
abstract class BaseImport implements ToModel, WithCustomCsvSettings, WithBatchInserts, WithChunkReading, ShouldQueue, WithEvents
{
聽聽聽聽 use Importable, RegistersEventListeners;
聽聽聽聽 ....
thanks!
Hello @zarapico same question to you then. Can you provide me with an example repo that replicates your issue?
Hello @zarapico same question to you then. Can you provide me with an example repo that replicates your issue?
Hello @patrickbrouwers
Here is the example repo: https://github.com/zarapico/Laravel-Excel-MemoryLeak
It is a clean installation of laravel 5.8 + and the current version of this repo.
There are 2 commands:
php artisan excel: importUserFileWithMemoryLeak
php artisan excel: importUserFileWithoutMemoryLeak
I think the most important thing is that the memory leak occurs from the events: https://github.com/zarapico/Laravel-Excel-MemoryLeak/blob/master/app/Imports/UsersImportWithoutMemoryLeak.php#L63-L77
Thank you so much for everything
Thanks @zarapico
I has a queue that had 100 workers, and since i had prioritised the default queue of which the excel import uses, i noticed that it had around 50 to 60 workers trying to process the huge file.
As a temporary solution again, i limited the number of workers for the default queue to around 4~5.
For me that has worked and my 800K excel files doesn't not take up more than 4GB RAM.
Maybe this info could help out.
@zarapico it seems it make sense that the 2 events cause a memory issue, as they both seem to load the entire spreadsheet into memory. Will see how we can fix that.
@MakamuEvans hm, that's interesting. I wouldn't expect the amount of workers to have an influence on it, as the jobs are chained. Are you using queue:work or horizon?
Why during reading chunks and finishes some operations on chunk, it doesn't realize memory but keep it in buffer and increase it?
Any one active on this channel?
Why ReadChunk job reads the entire file from the beginning to the end, every time for every chunk?
Those are these lines:
$spreadsheet = $this->reader->load(
$this->temporaryFile->sync()->getLocalPath()
);
This should read only rows from the beginning of chunk to last row in the chunk.
@mikizdr it doesn't load the file from the beginning to the end, we use a chunk read filter: https://github.com/Maatwebsite/Laravel-Excel/blob/3.1/src/Jobs/ReadChunk.php#L94
But why ChunkReadFilter::readCell outputs every cells in logs from the entire sheet for every chunk?
This is the log for excel sheet from 479 rows and chunk size = 200. Pay attention on lines NUMBER OF ROWS (there are 3 those lines) that comes from importer collection method with chunkSize concern.
https://gist.github.com/mikizdr/d296c2131986e4d72af4e1a479689ffc
So readCeil method is called for every single cell in the entire sheet for every chunk.
So readCeil method is called for every single cell in the entire sheet for every chunk.
That's how read filters work in PhpSpreadsheet yes.
But it is not efficient. If you have spreadsheet from 200K rows and 20 columns per row, chunk size = 500, it will read 400 times entire file and will spend a lot of resources. In one word - it will exhaust the system. That should be optimized in a way of reading the whole file just once and after that move cursor from start point of chunk to the end point of chunk, process data and go further. That's my opinion.
I can only use the functionality that PhpSpreadsheet offers me. This is the way they recommend it being used, there's no alternative currently.
As both packages are open source, you are free to PR a better solution.
I can do that but I need collaboration for some explanations related to existing code. And it should be done because of optimization and code efficiency. Running code in this state is not good at all: it's not memory optimized, not so fast as it can be, unnecessary repeating some steps,...
@zarapico events no longer load the entire spreadsheet in memory.
Will be fixed in the next release.
my problem is:
i can read and insert into into database one file with 5.8 mb > 140000 records. but if i use a file with 6.1mb 150 000 records, i have no errors, but the page does refresh and nothing is happens...
where can i see some error?
class ClienteImportModel implements ToModel, WithBatchInserts, WithChunkReading, WithEvents
{
use Importable, RegistersEventListeners;
public function model(array $row)
{
try {
if (!is_numeric($row[0])) {
return null;
}
return new ClienteTemp([
'numero' => intval($row[0]),
'nome' => $row[1],
'dataContrato' => date('Y-m-d',strtotime($row[2])),
'cp' => $row[3],
'idUser' => auth()->user()->id
]);
}
catch (\Exception $e)
{
echo $e;
}
}
/**
* @return int
*/
public function chunkSize(): int
{
return 5000;
}
/**
* @return int
*/
public function batchSize(): int
{
return 5000;
}
}
<<<<<<<<<<<<<<<<
Excel::import(new ClienteImportModel(),public_path('/import/excel/Book1b150.xlsx'));
Hi @patrickbrouwers
I have a similar problem, when trying to import a CSV of about 80000 lines. In my case it happens when it is combined with the ShouldQueue ContractsThis works:
abstract class BaseImport implements ToModel, WithCustomCsvSettings, WithBatchInserts, WithChunkReading, WithEvents { 聽聽聽聽 use Importable, RegistersEventListeners; 聽聽聽聽 ....This does not work:
abstract class BaseImport implements ToModel, WithCustomCsvSettings, WithBatchInserts, WithChunkReading, ShouldQueue, WithEvents { 聽聽聽聽 use Importable, RegistersEventListeners; 聽聽聽聽 ....thanks!
here not works .... :(
@gordett enclose Excel::import.... with try/catch block within controller. Then implement error handling in catch block. Try/catch has almost no meaning in model method in ClienteImportModel importer.
hi
i tried with try..Catch...
my problem was resolved. was my server... missing memory to big files...
hihihi
@gordett
hello, i'm facing same issue. can you please show me how you have implemented try/catch in controller ?
can you please paste your controller function here ?
Most helpful comment
@zarapico events no longer load the entire spreadsheet in memory.
Will be fixed in the next release.