As discussed with @zuk3975, @eternoendless, @jolelievre, @MatShir
Combinations generator can generate thousand or even millions of combinations. This operation requires a huge amount of processing and we expect it to crash a lot of servers.
So it's must be done in a way that preserve the server resources.
When BO user decides, from the UI, to generate combinations.
1) Javascript script fires an ajax call that computes a list of all the combinations to be generated. Just a list, stored in JSON, and for each item we record if it's generated or not (so at start, all items are set as "not generated").
2) We store the list into MySQL, in a new SQL table. Ajax call returns response to tell the browser how many items are in the list, and he's ready to start the generation.
3) Javascript script fires a 1st ajax request telling backend "I want to generate X of these combinations", X can be chosen/configured. Let's say 1000 as defaut number.
4) Backend processes 1000 items. When it's done, 1000 items are saved into database and list is updated. Backend notifies browser through ajax request return.
5) Javascript script notifies user (probably a progress bar) and fires next ajax request, asking to generate the next 1000 items. And it continues until everything is generated.
User can stop the process at some point and continue later.
We can notify user with
Thanks for opening this issue! We will help you to keep its state consistent
@matks the mysql storage could be considered a form of event sourcing :)
@matks, this is really huge scope (bigger than it might seem at first), so to make it clear:
Javascript script fires an ajax call that computes a list of all the combinations to be generated. Just a list, stored in JSON, and for each item we record if it's generated or not (so at start, all items are set as "not generated").
To have that "just a list of combinations to generate" means already to generate all the possible combinations (without any domain logic, just numbers). That number can grow up to billions of combinations very fast, so unless its also done in separate ajax requests it will be a killer. To generate all combinations using separate requests would require to store some index which would allow to identify where previous generation stopped. Moreover it also requires us to save the generated combinations to a database as you mentioned and ONLY then we can fire separate ajaxes to retrieve the list by chunks and apply all the domain logic to save the real combinations.
By the way, there is a math term for this case. It is called cartesian product
.
Example to see the numbers in action:
1. Lets say we have this array (using group names instead of ids to make it clearer, each number represents different attribute id).
[
Color => [2, 3],
Size => [5, 6],
Material => [7, 8],
],
it would generate following combinations. (if each row contains same amount of values you can use attributeCount^groupCount
to check how many combos there are: ) so it would be 2^3=8
:
[2,5,7], [2,5,8], [2,6,7], [2,6,8], [3,5,7], [3,5,8], [3,6,7], [3,6,8]
2. Now lets say we have 9
attribute groups where each group contains 7
attributes. It would result in 7⁹
combinations (:warning: 40.353.607 combinations of SINGLE PRODUCT) already.
Even if we manage to generate all those combinations its very likely that some other pages which are using combinations might break.
I tried to generate all combinations in current product page.
If we select all attributes from fixtures (all sizes(4), colors(14), dimensions(3) and paper type(4)
) it generates 672 combinations for a product. It does succeed in generating them, those appears in database, but the page can't handle them anymore, once you come to that product again it breaks. Of course its logical because there is no pagination currently. Thousands of records shouldn't be a problem with pagination tho..
All in all, if we want to improve this feature performance to handle bigger numbers we will have to use some advanced techniques and it will take a lot of time and effort. Also we need to explore whole prestashop if there are places which might break from many combos (including FrontOffice). And still it would make sense to think about some reasonable limits.
Indeed it seems it won't be easy ^^ So I think we should keep the simple case in mind for now (manageable through one request) and the pagination. We'll tackle with the million combinations later
Now that you have defined the problem solution will be easy ;)
We must avoid cartesian product ! We should never generate a list of all the possible combinations (Unless we want to stress test the system).
There is no need to paginate the creation of combinations because we should not create all of them in the first place.
One way of dealing with it is to use an overriding system.
You have a default value for a property of a product and you only write a new line in the db for the combinations which have a different value than the default. Which means you only create a new line when the value change for the first time.
Instead of reading values from one table you do it from two. You read the default value from one table and you check if overriden values exist for the combinations you need.
My guess is that this kind of optimization could handle 80% of the case. Should check with real catalogs of course.
And it will still be a lot of work with the current code base of course.
One way of dealing with it is to use an overriding system.
...
If most merchants doesn't edit combinations, it makes sense. And those who does, would be able to create some manually(?) (I doubt anyone edits millions combinations anyway).
However It would mean massive breaking changes in FO and BO :exploding_head:
Yep, the proposal makes sense but won't work with the existing model.
@zuk3975 nice! That is called a combinatorial explosion!
Maybe break the process once it takes too long. Let's say your 1 minute in with adding combinations:
Problem might be that checking which combinations are created might a labour intensive process for the server.
In my opinion, having thousand of combinaisons for a single product is a non sense and means that the product has to be restructured.
@jf-viguier In my opinion thousands of combinations should possible.
I have one example of a printable product that can only have specific width and heights and need to be defined.
Widths (15): [20 cm, 30 cm, 40 cm, 50 cm, 60 cm, 70 cm, 80 cm, 90 cm, 100 cm, 110 cm, 120 cm, 140 cm, 150 cm, 160 cm, 200 cm]
Heights (15): [20 cm, 30 cm, 40 cm, 50 cm, 60 cm, 70 cm, 80 cm, 90 cm, 100 cm, 110 cm, 120 cm, 140 cm, 150 cm, 160 cm, 200 cm]
Material (4): [Standard, Artistic, Luxurious, Vintage]
Finish (3): [Gloss, Matt, None]
That makes 2.700.
Keep in mind that we use that attributes to create a sort of a wizard where users can choose their options step-by-step. This is a rather pleasant UX.
Could you tell me how you would restructure this product?
Hi @matks
We can use batch generation by creating groups of combinations.
Consider the following case where you want to generate combinations :
_Taille_ : [S, M]
_Couleur_ : [Orange, Bleu, Vert]
If we use the current system we will get (Nb elements Group Tailles)x(Nb elements Group Couleur) = Nb combinations to generate = 2x3 = 6
:
[Taille - S, Couleur - Orange]
[Taille - S, Couleur - Bleu ]
[Taille - S, Couleur - Vert]
[Taille - M, Couleur - Orange]
[Taille - M, Couleur - Bleu]
[Taille - M, Couleur - Vert]
If we consider that :
(Group Tailles) = (Attribute S) + (Attribute M)
We can get :
Nb combinations to generate = [ (Attribute S) + (Attribute M)]x(Group Couleur)
Nb combinations to generate = (Attribute S)x(Group Couleur) + (Attribute M)x(Group Couleur)
Nb combinations to generate = Batch1 + Batch2
where :
Batch1 = (Attribute S)x(Nb elements Group Couleur) = 1x3 = 3
[Taille - S, Couleur - Orange]
[Taille - S, Couleur - Bleu ]
[Taille - S, Couleur - Vert]
Batch2 = (Attribute M)x(Nb elements Group Couleur) = 1x3 = 3
[Taille - M, Couleur - Orange]
[Taille - M, Couleur - Bleu]
[Taille - M, Couleur - Vert]
So we have two batches of three combinations to generate instead of six.
Current system | Using batches
:-------------------------:|:-------------------------:
|
Of course, in reality is much more complex. I took a basic example just to make my proposition clear.
Now that you have defined the problem solution will be easy ;)
We must avoid cartesian product ! We should never generate a list of all the possible combinations (Unless we want to stress test the system).
There is no need to paginate the creation of combinations because we should not create all of them in the first place.
One way of dealing with it is to use an overriding system.
You have a default value for a property of a product and you only write a new line in the db for the combinations which have a different value than the default. Which means you only create a new line when the value change for the first time.
Instead of reading values from one table you do it from two. You read the default value from one table and you check if overriden values exist for the combinations you need.
My guess is that this kind of optimization could handle 80% of the case. Should check with real catalogs of course.
And it will still be a lot of work with the current code base of course.
It's a good idea, but how do you handle the stock ? I mean if a specific variant is sold, you have to decrement this specific stock, so at the end, you need one row by combination.
Default stock value should be 0. And you only create a row when you have stock for a specific combination.
And if you really have a huge number combinations with stocks behind you probably have a tool like an ERP dedicated to manage this kind of complexity.
Default stock value should be 0. And you only create a row when you have stock for a specific combination.
And if you really have a huge number combinations with stocks behind you probably have a tool like an ERP dedicated to manage this kind of complexity.
Yes, but you will always need 1 line per combination in DB to handle the stock, even if you have an ERP to manager these combinations. You cannot do a request to your ERP to know the stock of the chosen combination when the user select it.
@PululuK sorry but i cant see the possible solution
@PululuK sorry but i cant see the possible solution
Precisely we are all looking for the solution ... Do not hesitate to participate 😅
If moving ahead with this as described by the OP, I think an important change would be for the generation to be triggered from the BO and then continue on the server until complete, without waiting for confirmation from the user every N records.
Here's why: Imagine needing to create 3001 combinations. You start the process, and it looks like everything is working, so you leave. A few minutes later, you would have been asked whether or not to generate records 1001-2000. But you've already moved on. Later, you come back to the product. Records 1-1000 are generated. How do you generate records 1001-3001? This would be a very frustrating experience for the user.
Here's what a more user-friendly workflow would be: Choose your attributes and click generate. BO does the math to see how many combinations you'll be generating, and if number of combinations is greater than N, it asks for a confirmation. If confirmed (or the number of combinations is less than N), then the product is flagged and the combinations are added to a queue. Once the queue is processed, the product flag is cleared. If the product is viewed in the BO while the product is flagged, there can be an alert showing the status, ie, how many combinations have been generated and how many are left to process. There could also be an option in the BO on the product combination page, to stop generating combinations.
But I think even better than the above workflow, assuming there is a need for massive numbers of combinations, and assuming there is a requirement to be able to generate such massive numbers of combinations on low-powered servers, would be to increase the efficiency of combination generation by using SQL statements to generate combinations directly in the database, and skipping all the complexity of a queue. In other words, I think the complexity of working directly with SQL would be less than the complexity of using a queue for this task, and less complex code is easier to write and maintain.
Additional benefits of the SQL approach: Increased efficiency for everyone who uses combinations, not just those generating large numbers of combinations, and those generating just a few combinations would not be using a queue (with the associated overhead for queue processing) for generating just a handful of combinations. And the SQL approach would probably work with the existing code base, without refactoring for queue processing.
Most helpful comment
@matks, this is really huge scope (bigger than it might seem at first), so to make it clear:
To have that "just a list of combinations to generate" means already to generate all the possible combinations (without any domain logic, just numbers). That number can grow up to billions of combinations very fast, so unless its also done in separate ajax requests it will be a killer. To generate all combinations using separate requests would require to store some index which would allow to identify where previous generation stopped. Moreover it also requires us to save the generated combinations to a database as you mentioned and ONLY then we can fire separate ajaxes to retrieve the list by chunks and apply all the domain logic to save the real combinations.
By the way, there is a math term for this case. It is called
cartesian product
.Example to see the numbers in action:
1. Lets say we have this array (using group names instead of ids to make it clearer, each number represents different attribute id).
it would generate following combinations. (if each row contains same amount of values you can use
attributeCount^groupCount
to check how many combos there are: ) so it would be2^3=8
:[2,5,7], [2,5,8], [2,6,7], [2,6,8], [3,5,7], [3,5,8], [3,6,7], [3,6,8]
2. Now lets say we have
9
attribute groups where each group contains7
attributes. It would result in7⁹
combinations (:warning: 40.353.607 combinations of SINGLE PRODUCT) already.Even if we manage to generate all those combinations its very likely that some other pages which are using combinations might break.
I tried to generate all combinations in current product page.
If we select all attributes from fixtures (
all sizes(4), colors(14), dimensions(3) and paper type(4)
) it generates 672 combinations for a product. It does succeed in generating them, those appears in database, but the page can't handle them anymore, once you come to that product again it breaks. Of course its logical because there is no pagination currently. Thousands of records shouldn't be a problem with pagination tho..All in all, if we want to improve this feature performance to handle bigger numbers we will have to use some advanced techniques and it will take a lot of time and effort. Also we need to explore whole prestashop if there are places which might break from many combos (including FrontOffice). And still it would make sense to think about some reasonable limits.