Since about 3 weeks the archive data in one of our Matomo instances is growing at a much larger rate than usual. There was no change on the server or the traffic that could explain this:
The log of the archiving showed the following error, that started at around the same time than the database started to increase in size:
Error: Got invalid response from API request: ?module=API&method=API.get&idSite=11&period=year&date=last7&format=php&trigger=archivephp. Response was 'a:2:{s:6:"result";s:5:"error";s:7:"message";s:63:"SQLSTATE[HY000]: General error: 2006 MySQL server has gone away";}'
Following the recommendations for that error from the FAQ #183, we tried the following:
max_allowed_packet to 1GBwait_timeout was already set to 8 hours (and the archiving script only runs for a few minutes)This didn't change anything, the same error is still happening.
Other measures we took:
The DBStats plugin confirmed that the yearly archive is bloating the DB. Although also the monthly archive table seems to have way too many rows compared to previous months. Some cleanup seems to happen on some tables, because at least the blob_2018_10 table shrunk drastically in the last days. Other weird stuff seems to happen as there are big changes in tables that should no longer be touched, e.g. blob_2017_12 (if I'm understanding the archiving process correctly).
Here's are two screenshots of the report tables sizes:
5th of November:

8th of November:

Checking the DB directly, we can see the many duplicate archive entries (again, if I'm understanding the process correctly, this query should return no results):
SELECT idsite, date1, date2, period, name, COUNT(*) as count FROM piwik_archive_numeric_2018_01 GROUP BY idsite, date1, date2, period, name HAVING count > 1 ORDER BY count DESC;
Here's the first 10 results (the result set has 1550 rows):
| idsite | date1 | date2 | period | name | count |
+--------+------------+------------+--------+--------------------------------------+-------+
| 11 | 2018-01-01 | 2018-12-31 | 4 | bounce_count | 2010 |
| 11 | 2018-01-01 | 2018-12-31 | 4 | max_actions | 2010 |
| 11 | 2018-01-01 | 2018-12-31 | 4 | nb_actions | 2010 |
| 11 | 2018-01-01 | 2018-12-31 | 4 | nb_visits | 2010 |
| 11 | 2018-01-01 | 2018-12-31 | 4 | sum_visit_length | 2010 |
| 11 | 2018-01-01 | 2018-12-31 | 4 | UserCountry_distinctCountries | 1959 |
| 11 | 2018-01-01 | 2018-12-31 | 4 | Actions_sum_time_generation | 1925 |
| 11 | 2018-01-01 | 2018-12-31 | 4 | Actions_nb_hits_with_time_generation | 1925 |
| 11 | 2018-01-01 | 2018-12-31 | 4 | Actions_nb_pageviews | 1925 |
| 11 | 2018-01-01 | 2018-12-31 | 4 | Actions_nb_uniq_pageviews | 1925 |
+--------+------------+------------+--------+--------------------------------------+-------+
Any hints what might be happening here and how we could go about fixing it? At the current rate of increased disk usage, we have about 2 weeks left before the disk is full. So first thing would be to find a way to delete duplicate archiving data manually to buy us more time. But we of course have to find a real solution that gets the yearly archiving working correctly again (which will hopefully also get the archive purging working again).
(This issue is related to #11322, #10439 and #7181)
Is browser archiving disabled? Have you thought about running optimize table on egpiwik_archive_blob_2018_01`? If there is lots of duplicate data, it may not change much though (I know InnoDB doesn't support optimize table but it'll recreate the table basically).
I'm not 100% sure but I think this query should give us some information on how many invalidated archives there are etc:
select `value`, period, count(*) from piwik_archive_numeric_2018_01 where `name` like "%done%" group by `value`, period
Do you know if the DB was already this size before updating to 3.6.1? Just checking to find out if there's a problem with archive deletion.
Also just to double check after changing the MySQL settings to fix MySQL server has gone away" MySQL was restarted or you double checked that the settings were applied?
Thanks for the fast answer!
optimize table. I might try that during during off-hours. But I doubt that it will help.| value | period | count(*) |
+-------+--------+----------+
| 1 | 1 | 2499 |
| 1 | 2 | 383 |
| 1 | 3 | 65 |
| 2 | 4 | 42 |
| 3 | 4 | 3572 |
| 4 | 1 | 4 |
| 4 | 2 | 9 |
| 4 | 3 | 9 |
| 4 | 4 | 2099 |
+-------+--------+----------+
max_allowed_packet is not big enough. Do you have any ideas how one might check this?Another question I didn't yet mention: in #11322 the command ./console core:purge-old-archive-data all is used to clean up the db. Should I try that? Is this save to run (no risk of deleting too much data) and is there a risk of tables locking or the db otherwise being stressed so that is should be done during off-hours?
I restarted MySQL after changing the db settings, yep. I was wondering if it's possible that the packet that Matomo is trying to insert is bigger than 1 GB and so even the maximum possible max_allowed_packet is not big enough. Do you have any ideas how one might check this?
nope, not really unfortunately.
fyi it seems there are 3572 temporary archives, and 2099 invalidated archives. I'm not quite sure but this could be reasonable as I suppose many archives for the year might be temporary. The invalidated archives could be likely removed I suppose but hard to tell without knowing all the content.
There shouldn't be any risk of deleting too much data with this command, and least I haven't heard of anything. This plus optimize table could reduce the DB size.
Dropping in since I am a colleague of @toebu.
I did run the table optimisation via mysqlcheck:
$ mysqlcheck -o piwik -u root -p
This ended up gaining a little bit, around 2 or 3 GB. The screenshot provided in the first comment was done after the optimisation.
I recommend you try running the other command to purge old archives and maybe this helps. It's hard to say why more archives exist in January than the others without looking at the data. Could be possible there were maybe some other websites tracked that were then deleted etc.
. It's hard to say why more archives exist in January than the others
fyi likely it's because January tables also store yearly archives.
I compared it with other DBs and there shouldn't be that many more for the yearly. Except when maybe some sites were deleted for example etc.
So, I have run the archive purging. It finished almost immediately and didn't change anything. So I'm guessing this task is normally executed through the cron job.
I have also found the command diagnostics:analyze-archive-table that is provided by the Matomo CLI. It gives the following summary for the 2018_01 period:
Total # Archives: 9987
Total # Invalidated Archives: 2497
Total # Temporary Archives: 4476
Total # Error Archives: 67
Total # Segment Archives: 6739
Total Size of Blobs: 11.8 G
For comparison, some other time period:
2017_01:
Total # Archives: 2255
Total # Invalidated Archives: 1023
Total # Temporary Archives: 0
Total # Error Archives: 0
Total # Segment Archives: 2175
Total Size of Blobs: 71.9 M
2018_08:
Total # Archives: 1934
Total # Invalidated Archives: 91
Total # Temporary Archives: 0
Total # Error Archives: 0
Total # Segment Archives: 1718
Total Size of Blobs: 51.3 M
2018_09:
Total # Archives: 1506
Total # Invalidated Archives: 0
Total # Temporary Archives: 0
Total # Error Archives: 0
Total # Segment Archives: 1296
Total Size of Blobs: 46.8 M
2018_10:
Total # Archives: 8932
Total # Invalidated Archives: 12
Total # Temporary Archives: 7342
Total # Error Archives: 0
Total # Segment Archives: 5492
Total Size of Blobs: 146.1 M
2018_11:
Total # Archives: 29017
Total # Invalidated Archives: 32
Total # Temporary Archives: 28396
Total # Error Archives: 0
Total # Segment Archives: 21090
Total Size of Blobs: 1.2 G
I also ran table optimisation again (mysqlcheck -o piwik -u root -p). This didn't change anything significant either.
You may wanna check if there were more idsites in that time range or so maybe by executing
select distinct idsite from piwik_archive_blob_2018_01 group by idsite;
select distinct idsite from piwik_archive_blob_2017_01 group by idsite;
etc.
It's hard to say what is happening there otherwise and whether it is a bug or legit without knowing the exact contents.
If I was you, I would probably make a backup of the table, manually delete all temporary and invalid archives (or even better the whole piwik_archive_blob_2018_01 and piwik_archive_numeric_2018_01), run the archiver again, and see what happens.
Those archives can be identified through the done flags that are stored in the value column in the numeric table. Eg value 3 identifies a temporary table, 4 an invalidated table.
Sorry for not getting back earlier.
We have finally found the source of the issue. After an embarrassingly long time, we found the following crucial message in the syslog:
[ERROR] InnoDB: The total blob data length (11161224) is greater than 10% of the total redo log size (100663296). Please increase total redo log size.
We therefore changed the MySQL config innodb_log_file_size = 128M (it previously was at the default, which is 48MB). After the server was restarted and the new log file size was in effect, Matomo archiving finally finished successfully again.
It now took a couple of days for Matomo to clean up the tables, but the DB is finally down to a more normal size again. The diagnostics:analyze-archive-table command for the period 2018_01 now shows:
Total # Archives: 3221
Total # Invalidated Archives: 82
Total # Temporary Archives: 191
Total # Error Archives: 1
Total # Segment Archives: 2971
Total Size of Blobs: 326.8 M
All in all, this was a DB issue, not an issue of Matomo. If you agree, this additional scenario for the cause of the Mysql Server has gone away error should be added to this FAQ question. Are these FAQ stores in a repo? If yes, I could make a PR with an updated answer.
Updated the FAQ 馃憤 That's an awesome find and be for sure super valuable. I'll check if I find related open issues. Cheers for that and glad the size reduces now 馃殌
Awesome, thanks again for your support!
Most helpful comment
Updated the FAQ 馃憤 That's an awesome find and be for sure super valuable. I'll check if I find related open issues. Cheers for that and glad the size reduces now 馃殌