Weblate: Performance issues (with 66320 source strings)

Created on 11 May 2019  Â·  19Comments  Â·  Source: WeblateOrg/weblate

Describe the bug
On my own Weblate install (my stats), saving a new translation takes a lot of time (28 seconds).

I'm unsure this is related to Weblate itselft, it could be related to sizing of the server.

To Reproduce
Steps to reproduce the behavior:

  • open any component and translate a new string.

Expected behavior

Next screen should come in less than a few seconds (less than 5 would be fine).

Screenshots

Here is a screenshot while waiting for weblate to save my new translation:
Capture d’écran de 2019-05-11 14-54-56

Still while running the same action, here is the request that appeared in postgres' log, I asked postgres to log every statement lasting for more than 1000 ms.

2019-05-11 14:07:07 CEST [11960-1] weblate@weblate LOG: durée : 6037.215 ms, instruction : SELECT DISTINCT "trans_unit"."id", "trans_unit"."translation_id", "trans_unit"."id_hash", "trans_unit"."content_hash", "trans_unit"."location", "trans_unit"."context", "trans_unit"."comment", "trans_unit"."flags", "trans_unit"."source", "trans_unit"."previous_source", "trans_unit"."target", "trans_unit"."state", "trans_unit"."position", "trans_unit"."has_suggestion", "trans_unit"."has_comment", "trans_unit"."has_failing_check", "trans_unit"."num_words", "trans_unit"."priority", "trans_unit"."pending" FROM "trans_unit" INNER JOIN "trans_translation" ON ("trans_unit"."translation_id" = "trans_translation"."id") INNER JOIN "trans_component" ON ("trans_translation"."component_id" = "trans_component"."id") WHERE (("trans_unit"."content_hash" = -4464890554333545647 AND "trans_component"."project_id" = 3 AND "trans_translation"."language_id" = 58) OR ("trans_unit"."id_hash" = -4464890554333545647 AND "trans_component"."project_id" = 3 AND "trans_translation"."language_id" = 58) OR ("trans_unit"."source" = 'Additional differences appear in certain screens; notably the custom partitioning process is very different from other Linux distributions. These differences are described in each screen''s subsection.' AND "trans_component"."project_id" = 3 AND "trans_translation"."language_id" = 58)) ORDER BY "trans_unit"."priority" ASC, "trans_unit"."position" ASC

Here are some statistics about postgresql:

datid | datname | numbackends | xact_commit | xact_rollback | blks_read | blks_hit | tup_returned | tup_fetched | tup_inserted | tup_updated | tup_deleted | conflicts | temp_files | temp_bytes | deadlocks | blk_read_time | blk_write_time | stats_reset
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
207325 | weblate | 0 | 8147375 | 3 | 75364357 | 1784885178 | 2295252681 | 975238813 | 294444 | 2087262 | 1159 | 0 | 58 | 217874432 | 1 | 0 | 0 | 2019-04-16 07:10:43.419484+0

Server configuration and status
Same info as in https://github.com/WeblateOrg/weblate/issues/2730

Thanks a lot for your help

bug

All 19 comments

Most likely this is the guilty code:

https://github.com/WeblateOrg/weblate/blob/3753f01e09f880078936b034f19eb70e622acbfe/weblate/trans/views/edit.py#L70-L81

As there is no index on source field, it can really take long. I wonder why I've never noticed this, maybe having enough memory will make the database cache such lookups more effectively.

I've tried to make the query slightly better in 2207bbc690f3ddf2681e130e9b77cd37d69623aa. It would be great if you can see if that improves performance in your case. I guess it will be still quite slow, and further tweaks will be needed.

@Jibec Added to https://github.com/WeblateOrg/weblate/wiki/References#fedora
Hope that brings some stress-testing your way :)

@Jibec Added to https://github.com/WeblateOrg/weblate/wiki/References#fedora
Hope that brings some stress-testing your way :)

That's very nice of you, but please revert this reference, my hosting is only a demonstrator aiming to help the fedora community to test documentation internationalization, and to show we can use Weblate to replace Zanata. We are working on a migration plan and Weblate isn't validated yet. I'm pretty sure we'll use it, but I don't want the community to be forced, or external people to think we already migrated.

I've tried to make the query slightly better in 2207bbc. It would be great if you can see if that improves performance in your case. I guess it will be still quite slow, and further tweaks will be needed.

Thank you nijel, the process I used was as follow:

I applied your patch and restarted the weblate daemon.
Delay is exactly the same, 28 seconds.

The following request took: 13888 ms

URL de la requête :https://translate.holcroft.fr/translate/fedora-l10nfedora-release-notes/rawhidepagesdevelopersdevelopment_go/fr/?type=check:inconsistent&offset=1&checksum=2138717e29b4e6fa&merge=15859
Méthode de la requête :GET
Adresse distante : %
Code d’état :
302
Version :HTTP/2.0
Politique de référent :origin

The following request took: 26822 ms

URL de la requête :https://translate.holcroft.fr/projects/fedora-l10nfedora-release-notes/rawhidepagesdevelopersdevelopment_go/fr/
Méthode de la requête :GET
Adresse distante : %
Code d’état :
200
Version :HTTP/2.0
Politique de référent :origin

the file I edited manually was: venv/lib/python3.5/site-packages/weblate/trans/views/edit.py

I kind of expected that my change will not make situation much better. In the end the generated query is same, just better ordered (what should be handled by the database optimizer) and removing distinct (which should be cheap on few results this generates). So we really need indexing on the source string as well.

PS: I still wonder why it takes so long in your case, 1k strings (which are in the project and language) should be way faster to scan on decent system than in 10-20 seconds. This query is there for several releases and nobody complained so far and I'm pretty confident people would complain about such big response times.

You're probably right, I bought an additional 8GB of RAM to see if that's better. But I also think it is related to the way I use weblate: 1 repo for multiple components with one single repo.

One extreme example is in attachment, here is the summary of a 28 minutes and 16 seconds execution time. I noticed the repo was behing master of 3 commits, I manually pulled the repository using the following link: https://translate.holcroft.fr/projects/fedora-l10nquick-docs/#repository

From | To | Duration | Total sec | Sec per component (240) | Description
-- | -- | -- | -- | -- | --
12:44:18 | 12:44:34 | 00:00:16 | 16 | n/a | it updates the repository (with update taking 3.13 seconds, it means rebase took 13 seconds to weblate)
12:44:34 | 12:49:08 | 00:04:34 | 274 | 1.14 | it triggers add-ons, discovery is slow but launched only once but consistency.languages is launched on every components, NOTE : numbering looks odd, it stats at 0/240 and ends at 239/240
12:49:10 | 12:49:10 | 00:00:00 | 0 | 0 | it detect a change and update weblate (there is only one language, it’s really fast
12:49:10 | 13:12:20 | 00:23:10 | 1390 | 5.79 | it updates linked project
13:12:25 | 13:12:34 | 00:00:09 | 9 | n/a | runs batch check and updates unit flag

During this execution, only three postgresql requests exceeded 500 ms:

2019-05-12 13:12:27 CEST [598-1] weblate@weblate LOG:  durée : 1290.157 ms, instruction : UPDATE "trans_unit" SET "has_failing_check" = false WHERE "trans_unit"."id" IN (SELECT U0."id" FROM "trans_unit" U0 INNER JOIN "trans_translation" U1 ON (U0."translation_id" = U1."id") INNER JOIN "trans_component" U2 ON (U1."component_id" = U2."id") WHERE U2."project_id" = 29)
2019-05-12 13:12:27 CEST [598-2] weblate@weblate LOG:  durée : 743.017 ms, instruction : UPDATE "trans_unit" SET "has_comment" = false WHERE "trans_unit"."id" IN (SELECT U0."id" FROM "trans_unit" U0 INNER JOIN "trans_translation" U1 ON (U0."translation_id" = U1."id") INNER JOIN "trans_component" U2 ON (U1."component_id" = U2."id") WHERE U2."project_id" = 29)
2019-05-12 13:12:28 CEST [598-3] weblate@weblate LOG:  durée : 723.296 ms, instruction : UPDATE "trans_unit" SET "has_suggestion" = false WHERE "trans_unit"."id" IN (SELECT U0."id" FROM "trans_unit" U0 INNER JOIN "trans_translation" U1 ON (U0."translation_id" = U1."id") INNER JOIN "trans_component" U2 ON (U1."component_id" = U2."id") WHERE U2."project_id" = 29)

Note: I wonder if the checks and flag updates shouldn't be ran for each component.
quick-docs update.txt

Looking at the log, the triggering post/pre update for linked components is taking most time and it donsn't make much sense - those events should be per repository so they should be triggered just once. Also the language consistency addon should not use this event as it's triggered before parsing files, so before any new translations appear in Weblate...

The checks/flags updates are done once for all linked component, it's just the log doesn't show that. It's expected that those queries take long, but it's still way faster than calculating unit states while updating.

Okay, those can not be per repository as they are used also for things like msgmerge, which definitely needs to be per component. I've at least optimized fetching list of addons in https://github.com/WeblateOrg/weblate/commit/56ab94115a923c2da624298bcd6ce27f59596a68, what should make situation better (and execute way less queries in your case).

The 6 second for no-op updates is too much, unfortunately the logging is not detailed enough to figure out where the bottleneck might be. I've added some more logging in https://github.com/WeblateOrg/weblate/commit/5c43f145794ab894cf106766604549c34914d751

Looking again at the log, I think there is problem in stats invalidation:

mai 12 13:12:28 ynh weblate[32526]: DEBUG fedora-l10nquick-docs/nav: all unit flags updated
mai 12 13:12:34 ynh weblate[32526]: INFO fedora-l10nquick-docs/nav: updating completed

What is executed between these two is:

https://github.com/WeblateOrg/weblate/blob/5c43f145794ab894cf106766604549c34914d751/weblate/trans/models/component.py#L1361-L1372

The cleanup was not triggered (it's triggered only in case something gets removed). That really keeps just notifications (which was none as it would be logged) and cache invalidation. So apparently the this consumes those nearly 5 seconds.

Looking at the source, the problem is obvious - it always invalidates cache for all 240 components:

https://github.com/WeblateOrg/weblate/blob/5c43f145794ab894cf106766604549c34914d751/weblate/trans/models/translation.py#L852-L865

Can you try following patch:

diff --git a/weblate/trans/models/component.py b/weblate/trans/models/component.py
index 1c8d72a5ce..e3419aa325 100644
--- a/weblate/trans/models/component.py
+++ b/weblate/trans/models/component.py
@@ -1365,7 +1365,7 @@ class Component(models.Model, URLMixin, PathMixin):
         # First invalidate all caches
         self.log_info('updating stats caches')
         for translation in translations.values():
-            translation.invalidate_cache()
+            translation.invalidate_cache(recurse=not skip_checks)
             translation.notify_new(request)

         self.checks_cache = None

It should improve situation in your case. Unfortunately it's not generally applicable (it would work only on projects with language consistency enabled), so more complex solution is needed.

thanks for these improvements.

I applied: the log improvement and the patch you asked me to apply. Note, the number of component changed.

From | To | Duration | Total sec | Sec per component (245) | Description
-- | -- | -- | -- | -- | --
21:15:04 | 21:15:18 | 00:00:14 | 14 | n/a | it updates the repository (with update taking 4.78 seconds, it means rebase took 10 seconds to weblate)
21:15:18 | 21:20:06 | 00:04:48 | 288 | 1.17551020408163 | it triggers add-ons, discovery is slow but launched only once but consistency.languages is launched on every components, NOTE : numbering looks odd, it stats at 0/240 and ends at 239/240
21:20:08 | 21:20:08 | 00:00:00 | 0 | 0 | it detect a change and update weblate (there is only one language, it’s really fast
21:20:08 | 21:44:36 | 00:24:28 | 1468 | 5.99183673469388 | it updates linked project
21:44:36 | 21:44:49 | 00:00:13 | 13 | n/a | runs batch check and updates unit flag

Hope it helps!
quick-docs update-2.txt

if this may help somehow, here is the commit that changed on the source repository: https://pagure.io/fedora-l10n/quick-docs/c/3d5c2d9e2c530a4d633f9f95bb358a5882fadc96?branch=master

Thanks, the log helped me to spot things I've missed before (that the master component is in different project). Now I see several problems with this and I'm afraid that backporting fixes will no longer work for you...

Le 2019-05-13 09:44, Michal Čihař a écrit :

Thanks, the log helped me to spot things I've missed before (that the
master component is in different project). Now I see several problems
with this and I'm afraid that backporting fixes will no longer work
for you...

Well, I think I can upgrade to Weblate 3.6.1, if you can build a patch
from this, I'll be able to test.
I assume this means you creating a specific branch from 3.6.1 and cherry
pick commits did for this ticket. But only you knows if this is doable
or not (it may depends on other commits and making everything messy)

If this isn't possible, I'll wait for next release. Either way, my new
RAM is coming, which will hopefully help.

Thank you for your report, the issue you have reported has just been fixed.

  • In case you see problem with the fix, please comment on this issue.
  • In case you see similar problem, please open separate issue.
  • If you are happy with the outcome, consider supporting Weblate by donating.

As I don't know where to ask this, I try here: do you have plans for the release 3.7? I'm wondering if I should wait for it to continue digging other performance issues I see. My upgrade to 3.6 looks to works quite fine in testing, I'll migrate my production server to it soon. What do you think about it?

Just look at the 3.7 milestone, it's scheduled on June 15, 2019.

OK, thanks! I didn't knew about the milestone view... Only about the projects view

Was this page helpful?
0 / 5 - 0 ratings