fix-duplicate-log-actions found about 21.000 entries out of 50.000.000 to fix. Each fix takes about 1 hour in a 100GB database. This means it would take about 2.5 years to fix all duplicate entries.
Is there any way to make things faster?
Hi @iMarkus could you please send the command output with the times so we can see how slow it is?
Hi @mattab - here is the command output. I just let it run for 90 entries so far to see if it may speed up, but it takes always ~1hour to finish.
Found 21137 actions with duplicates.
DEBUG [2017-10-13 02:23:15] idaction = 8465009, duplicate idactions = [array]
Fixed duplicates in piwik_log_link_visit_action. Time elapsed: 3564.746s.
Fixed duplicates in piwik_log_conversion. Time elapsed: 1.103s.
Fixed duplicates in piwik_log_conversion_item. Time elapsed: 0.041s.
DEBUG [2017-10-13 03:22:41] idaction = 8335013, duplicate idactions = [array]
Fixed duplicates in piwik_log_link_visit_action. Time elapsed: 3618.117s.
Fixed duplicates in piwik_log_conversion. Time elapsed: 1.204s.
Fixed duplicates in piwik_log_conversion_item. Time elapsed: 0.030s.
DEBUG [2017-10-13 04:23:00] idaction = 8363875, duplicate idactions = [array]
Fixed duplicates in piwik_log_link_visit_action. Time elapsed: 3583.670s.
Fixed duplicates in piwik_log_conversion. Time elapsed: 1.268s.
Fixed duplicates in piwik_log_conversion_item. Time elapsed: 0.025s.
DEBUG [2017-10-13 05:22:45] idaction = 8271979, duplicate idactions = [array]
Fixed duplicates in piwik_log_link_visit_action. Time elapsed: 3633.765s.
Fixed duplicates in piwik_log_conversion. Time elapsed: 1.266s.
Fixed duplicates in piwik_log_conversion_item. Time elapsed: 0.044s.
@iMarkus would you please enable MySQL Slow query log, and then run the command again, and post here the Slow queries from the slow query logs?
@mattab enclosed the update query:
UPDATE piwik_log_link_visit_action SET
idaction_url_ref = IF((idaction_url_ref IN (9787130)), 9787126, idaction_url_ref),
idaction_name_ref = IF((idaction_name_ref IN (9787130)), 9787126, idaction_name_ref),
idaction_name = IF((idaction_name IN (9787130)), 9787126, idaction_name),
idaction_url = IF((idaction_url IN (9787130)), 9787126, idaction_url),
idaction_event_action = IF((idaction_event_action IN (9787130)), 9787126, idaction_event_action),
idaction_event_category = IF((idaction_event_category IN (9787130)), 9787126, idaction_event_category),
idaction_content_interaction = IF((idaction_content_interaction IN (9787130)), 9787126, idaction_content_interaction),
idaction_content_name = IF((idaction_content_name IN (9787130)), 9787126, idaction_content_name),
idaction_content_piece = IF((idaction_content_piece IN (9787130)), 9787126, idaction_content_piece),
idaction_content_target = IF((idaction_content_target IN (9787130)), 9787126, idaction_content_target)WHERE idaction_url_ref IN (9787130) OR idaction_name_ref IN (9787130) OR idaction_name IN (9787130) OR idaction_url IN (9787130) OR idaction_event_action IN (9787130) OR idaction_event_category IN (9787130) OR idaction_content_interaction IN (9787130) OR idaction_content_name IN (9787130) OR idaction_content_piece IN (9787130) OR idaction_content_target IN (9787130);
Thanks for the query @iMarkus :+1:
this SQL query looks nasty! Hopefully we can find a better way. unfortunately we likely won't be able to work on this for a little while as we don't have much time available, but we'll get to it.
Hey, I'm experiencing a similar problem; Would you mind if I took a shot at trying to improve the speed of the query.
Most helpful comment
Hey, I'm experiencing a similar problem; Would you mind if I took a shot at trying to improve the speed of the query.