When using recurring workflows triggered by scheduled task (cron), the tables aow_processed and aow_processed_aow_actions can quickly become huge and bog down the system.
Log tables like aow_processed and aow_processed_aow_actions should be pruned automatically before they become too large.
If you set up workflows to be fired by scheduled task these tables can quickly become very large.
I'd suggest creating a new pruning script that can be setup via Admin>Scheduled Tasks to:
see e.g. my pruning query here:
https://suitecrm.com/forum/developer-help/11262-very-large-aow-processed-and-aow-processed-aow-actions-tables#56405
Agreed, this is necessary. But why not make it part of the existing Prune job?
adding a simple Admin panel in the Advanced OpenAdmin section wouldn't be a big deal I hope?
I'd certainly vote for having this pruning script on by default with some default settings like deleting the log older than 1 month or something like that.
This got me thinking. What is the use of all this information? On the forum posts, some people mentioned having 14 million records!
Maybe we should start thinking about this a little earlier: do all these records really need to be generated? By default? Or shouldn't it be some extreme-debugging-logging-option, off by default?
Its is required for workflows that don't use repeated runs, especially one that run on the scheduler where it is set to run on all records, as this prevents it from running more than once on a record. The common problem is when "repeated runs" is used without understanding how it works, this can cause the tables to fill up rapidly. A Solution for this, is to just update the the logs with a count as opposed to a new entry each time, this would help reduce the load. We can easily add to the pruning job to remove rows for deleted records but doing so for old record may have unexpected results.
We have now set up a new home for suggestions at Trello. All github issues that were labeled 'suggestion' have been moved and will be closed. Certain ones will be progressed within the new Suggestion Box and may be re-opened.
Announcement of moving Suggestions:
https://suitecrm.com/forum/suggestion-box/13691-moving-suggestions-from-github-to
New SuiteCRM Suggestion Box
https://trello.com/b/Ht7LbMqw/suitecrm-suggestion-box
@Dillon-Brown I would re-open this and label it Bug. I regularly (though not very frequently) see people in the forums whose database has Gigabytes of these rows. It slows the system down, and it's easy to run out of disk space (especially inside a VM), with critical consequences.
Like @pgorod, I also feel this should be labeled as a bug and added to a standard scheduled prune job in SuiteCRM.
Maybe you can build upon the sql queries I've been using to keep my aow_processed tables cleaned up:
USE crm;
CREATE TABLE aow_processed_new LIKE aow_processed;
# retain all aow_processed rows that belong to WF that run only once per record
INSERT aow_processed_new SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 0;
# prune aow_processed rows that belong to WF that run multiple times, based on date
INSERT aow_processed_new SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1 AND wp.date_entered >= '2017-10-15 00:00:00';
REPAIR TABLE aow_processed_new;
CREATE TABLE aow_processed_aow_actions_new LIKE aow_processed_aow_actions;
# retain all aow_processed_aow_actions rows that belong to WF that run only once per record
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 0) as wp ON wpa.aow_processed_id = wp.id;
# prune aow_processed_aow_actions rows that belong to WF that run multiple times, based on date
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1) as wp ON wpa.aow_processed_id = wp.id AND wpa.date_modified >= '2017-10-15 00:00:00';
REPAIR TABLE aow_processed_aow_actions_new;
ALTER TABLE aow_processed RENAME aow_processed_old;
ALTER TABLE aow_processed_new RENAME aow_processed;
ALTER TABLE aow_processed_aow_actions RENAME aow_processed_aow_actions_old;
ALTER TABLE aow_processed_aow_actions_new RENAME aow_processed_aow_actions;
DROP TABLE aow_processed_old;
DROP TABLE aow_processed_aow_actions_old;
Naturally the cutoff dates would need to be set dynamically as in "more than a year old" or something like that.
nice-to-have
admin setting to choose the criteria for pruning these tables (with a sensible default value).
Thanks for that John, very useful.
Another "nice-to-have" which isn't too much work and would be quite practical is a button, inside the workflow, next to the Repeated Runs check-box, saying Reset runs history which would give a warning message box explaining the consequences, and then would delete all the aow_processed information for that specific workflow. Sometimes people are ok with just deleting everything.
Dear authors,
Thank you very much for the posted information, it is about to save us a lot of trouble.
Regarding the technique for dealing with the aow_processed_aow_actions table, should not the query that inserts into the new table take into account whether the parent row in the aow_processed table was copied or not into its own new table ?
I'm not familiar with the internal application logic to determine whether it would allow the existence of a child aow_processed_aow_actions row that would have a date_modified value newer than the date_entered value of its parent row. If this is possible then it seems that the insert query for the new aow_processed_aow_actions_new table should be as follows (using your same datetime cutoff as an example):
INSERT aow_processed_aow_actions_new SELECT wpa.* FROM aow_processed_aow_actions AS wpa JOIN (SELECT wp.* FROM aow_processed AS wp JOIN aow_workflow AS w ON wp.aow_workflow_id = w.id AND w.multiple_runs = 1 AND wp.date_entered >= '2017-10-15 00:00:00') as wp ON wpa.aow_processed_id = wp.id AND wpa.date_modified >= '2017-10-15 00:00:00');
Thank you again for your help.
...or...something like this might be a way to select only those aow_processed rows which they themselves are older than the cutoff datetime and only those all of whose child rows in the aow_processed_aow_actions table are also all older than the same cutoff time, which may provide some extra consistency, as it may age out of the database only entire execution sessions of the workflow process.
SELECT wp.id
FROM aow_processed wp
WHERE wp.date_entered < '2017-10-15 00:00:00'
AND wp.id IN (
SELECT wpa.aow_processed_id
FROM aow_processed_aow_actions wpa
GROUP BY wpa.aow_processed_id
HAVING MAX(wpa.date_modified) < '2017-10-15 00:00:00'
);
Then the aow_processed_aow_actions and related aow_processed rows can be deleted as an atomic transaction including all related rows
Is this now fixed ? I mean do the Prune Database workflow delete aow_processed and aow_processed_aow_actions ?
I don't think there were any changes to the code.
As I wrote in 2017, I believe this Issue could be reopened and labeled a bug because of the nasty consequences it has for some people. With some care the existing solutions could be made generic and applied in core.
I said that because If its not fixed I think I will make a CRON or a view in SQL to prune thoses two table. I know that aow_processed only store logs, IDK about aow_processed_aow_actions, but It grow rly quick in size too.
Remember that those "logs" are relevant to the Repeated runs option. So messing with those tables alters the behavior of Workflows.
That's why only a case-by-case evaluation is possible.
Just for my reference, this issue #3744 is also about database clean-ups.
@Dillon-Brown can we consider a re-opening of this issue here, as I asked above?
huge aow table is definitly a problem if you run many workflows daily.
if you delete the rows in it, the work flow may run for the record again, then you end up having many new entries been created in your crm again and again.
To some extent, this is unavoidable, if SuiteCRM needs to track each record. If the problem hits you very hard I think you should consider moving the information into the record. What I mean is to add fields (or custom fields) so that the record itself stores its own state, in a manner that is relevant to your business. Then your workflows can use more conditions based on that data, in order to be more selective.
I hope this helps.
Most helpful comment
@Dillon-Brown I would re-open this and label it Bug. I regularly (though not very frequently) see people in the forums whose database has Gigabytes of these rows. It slows the system down, and it's easy to run out of disk space (especially inside a VM), with critical consequences.