I've come across a bug when trying to update a database in a migration using chunk where some records aren't being saved. In our case we're adding a nullable column, setting the values for that column and then making it nullable within a single migration. Using chunk though attempting to remove the nullable option from the column would throw an error because some records didn't have the values set.
I've created a minimal example to show this off - https://github.com/EspadaV8/laravel-chunking-non-persistance/tree/develop - cloning this and running php artisan migrate:refresh --seed should produce something like the following output
Empty example table
+----+------+----------+
| id | name | new-name |
+----+------+----------+
Populated example table with `name`
+----+------------+----------+
| id | name | new-name |
+----+------------+----------+
| 1 | mM4YJMR3i2 | |
| 2 | ZYNNY6DBqi | |
| 3 | okKt4kIARY | |
| 4 | VHJ6ni85c8 | |
+----+------------+----------+
Populated example table with `name-new` using `chuck`
+----+------------+----------------+
| id | name | new-name |
+----+------------+----------------+
| 1 | mM4YJMR3i2 | mM4YJMR3i2-new |
| 2 | ZYNNY6DBqi | ZYNNY6DBqi-new |
| 3 | okKt4kIARY | okKt4kIARY-new |
| 4 | VHJ6ni85c8 | |
+----+------------+----------------+
The one row without a new-name set should've been set in the second chunk batch.
Are you able to send a PR please?
I did try having a go at fixing this but I wasn't able to work out where the issue is. If I get a chance to have another look at it I'll see if I can submit something (if you have any pointers as to where to start looking that'd be great).
Made a slight change to the seeder so that it appends to the name-new column instead of just setting it to a known value and it now produces the following output
espadav8@dolores:~/Workspace/chunking-non-persistance (*)
> php artisan migrate:refresh --seed develop [7fd9127] modified
Rolled back: 2015_12_10_235952_create_tables
Migrated: 2015_12_10_235952_create_tables
Empty example table
+----+------+----------+
| id | name | name-new |
+----+------+----------+
Populated example table with `name`
+----+------------+------------+
| id | name | name-new |
+----+------------+------------+
| 1 | kVmjtZUXTH | ZoerIsEvoZ |
| 2 | 0VkvI2TSIF | DZ1R4Joz26 |
| 3 | TKaJTOiVzH | W3eOttRChc |
| 4 | bS6TeDg3xB | DJgjzlQBX5 |
+----+------------+------------+
Populated example table with `name-new` using `chuck`
+----+------------+--------------------+
| id | name | name-new |
+----+------------+--------------------+
| 1 | kVmjtZUXTH | ZoerIsEvoZ-new |
| 2 | 0VkvI2TSIF | DZ1R4Joz26-new |
| 3 | TKaJTOiVzH | W3eOttRChc-new-new |
| 4 | bS6TeDg3xB | DJgjzlQBX5 |
+----+------------+--------------------+
For some reason the second chunk call returns just 1 record, but it's a record that's already been returned (in this case record 3).
It looks like the queries that are being created are correct ($this->forPage($page, $count)->toSql())
select * from "example" limit 3 offset 0;
select * from "example" limit 3 offset 3;
select * from "example" limit 3 offset 6;
Looks like this might actually be an issue with the postgresql PDO driver. I've submitted it to PHP - https://bugs.php.net/bug.php?id=71176. There's a new script I've added that only uses PDO and shows the same kind of output
https://github.com/EspadaV8/laravel-chunking-non-persistance/blob/develop/public/pg-pdo-test.php
So, that bug was closed as Not a bug. It looks like it's expected behaviour, at least with postgresql - http://www.postgresql.org/docs/current/static/queries-limit.html
Because the chunk doesn't have any kind of ORDER BY the results can be returned in any order. The simple fix would be to ORDER BY the primary key (unless something else has been set). Does that sound like an okay fix for you @GrahamCampbell ?
Feel free to send a PR, and we'll review it.
I managed to lose a comment here somehow :confused:
To have chunk use an ORDER BY it would need to know what the primary key for a table is, however, there's no way of knowing that within the Builder class. It could check to see if an ORDER BY has been set and try to use id but that could fail, or it could throw an exception if no ORDER BY has been set.
Neither of those sound too nice to me. Is there another way that I might be missing that you can think of?
@EspadaV8 You can call $this->model->getKeyName() in the Builder to get the primary key column. Does that help?
As far as I know, the Builder doesn't have a reference to an particular model since it can be used by itself. Something like...
DB::table('users')->chunk(...);
Doesn't know what columns are in that table or what the tables primary key is. Currently it will work but in an unsupported way since is should require an orderBy.
Well, eloquent Builder does, query builder doesn't. Both have chunk method.
Ah, okay. I was talking about the query builder in this case. The issue could exist within the eloquent builder too though.
What actually needs fixing then?
I still think that the chunk method should require an order by clause as recommended by both the postgresql and MySQL docs. I wanted to do a quick test of what @taylorotwell mentioned in his comment https://github.com/laravel/framework/pull/11510#issuecomment-167334145 because I don't think you can do any kind of chunk when the where includes a field that you are updating because the second call would then skip a whole chunk that are now the records at offset 0 (so you would need to force the offset to 0 for each call).
If there isn't to be a change to force an order by then I could update to docs to add a warning recommending that an order by be included.
I still think that the chunk method should require an order by clause as recommended by both the postgresql and MySQL docs.
That's not a bug though. You just add an orderby...
EspadaV8, I'm not particularly familiar with how the query builder's internal work, but it appears that at least in MySQL, one can sort by _rowid, which I think might fix the issue and may be something that could automatically be appended to queries even when you don't know the primary key's.
I don't know if this is what you're looking for, though. I'm pretty new to the internals of Laravel, and don't know if doing special casing based on the database type is something that may be done in the query builder or not.
If it's not going to be changed, then it should be noted in the Laravel docs.
We're open to PRs to the docs. Thank you everyone! :heart:
I had submitted https://github.com/laravel/docs/pull/2227, but it wasn't committed in its entirety. Taylor committed my changes to the sample code for chunk() to include orderBy(), but not the explanation about _why_ one should use orderBy() with chunk(). I can write a more complete explanation if that would help. I just think leaving this issue out there without any explanation is creating confusion.
Most helpful comment
If it's not going to be changed, then it should be noted in the Laravel docs.