Semanticmediawiki: `ApprovedRevs`, self-reference queries, and delayed storage update

Created on 20 Jul 2018  Â·  20Comments  Â·  Source: SemanticMediaWiki/SemanticMediaWiki

Setup and configuration

  • SMW version: 2.5.6
  • SESP version: 2.0.0-alpha (6060a7d)
  • Approved Revs: 0.8 (237bbbd)
  • MW version: 1.31.0 (176ea60)
  • PHP version:7.0.30-0+deb9u1 (apache2handler)
  • MySQL version: 10.1.26-MariaDB-0+deb9u1

    Issue

SESP properties do not seem to be available to templates until the page they originate from has been purged and the page that uses them has been purged.

Steps to reproduce

This problem seems especially fiddly, so I set up a VM with a pristine setup to reproduce the problem. The source of the pages is given after the steps to reproduce.

There seem to be a few problems here other than the SESP problem, but I think I've definitely shown that SESP properties show up on Special:Browse but then are not reflected in inline ask queries.

  • [ ] Multiple edits to both ApproveTestShowRevs and ApproveTestRevsNotShown
  • [ ] Verify job queue is clear
  • [ ] Verify ApproveTestShowRevs does not display “Approved rev” but does show Revision Id. and that Category is “Pages with no approved revisions”
  • [ ] Verify ApproveTestRevsNotShown does shows Revision Id. and that Category is “Pages with no approved revisions”
  • [ ] Verify “Category:Pages with no approved revisions” contains both pages
  • [ ] Approve last rev of ApproveTestShowRevs
  • [ ] Make sure most recent rev is approved and “Approved pages” category shows up on the page (it does, but after two reloads)
  • [ ] Visit “Category:Pages with approval process”
  • [ ] note that it shows in the header from the template as “NOT Approved” and does not show on “Category:Approved pages” page
  • [ ] verify the job queue is empty
  • [ ] verify Special:Browse for ApproveTestShowRevs shows “Approved revision” and “Revision id” are the same. Also, that “Approval status” shows “approve”.
  • [ ] Visit “Category:Pages with approval process” again to veify no change
  • [ ] purge “Category:Pages with approval process” and verify no change (2x)
  • [ ] Purge ShowRevs page, verify that it shows both AR and RID
  • [x] Visit “Category:Pages with approval process” again to veify no change. Reload several times to be sure.
  • [x] Purge “Category:Pages with approval process”
  • [x] Verify that “ApproveTestShowRevs” has moved to “Completely reviewed” in the header.
  • [x] Edit both subject pages
  • [x] Verify “ApproveTestShowRevs” has changed, RID != AR and category has changed from “Approved pages” to “Pages pending approval”
  • [x] Visit “Category:Pages with approval process” and verify that it is _STILL_ in the “Completely reviewed” slot.
  • [x] Visit “Category:Pages pending approval” and verify it _IS NOT_ listed.
  • [x] Visit “Category:Approved pages” and verify it is _IS_ listed.
  • [x] Visit “Special:Browse/ApproveTestShowRevs” and verify that it shows the Approved revision != Revision ID
  • [x] Purge “Category:Pages with approval process”
  • [x] Verify position of pages has not changed
  • [x] Purge “ApproveTestShowRevs”
  • [x] Visit “Special:Browse/ApproveTestShowRevs” again and verify “ApproveTestShowRevs” is _STILL_ in “Completely reviewed” slot
  • [x] Purge “Category:Pages with approval process”
  • [x] Verify “ApproveTestShowRevs” is _FINALLY_ in “Modifications NOT approved” slot
  • [x] Verify that “Category:Pages pending approval” _DOES NOT YET_ show the page
  • [x] Visit “Category:Approved pages” and verify it is _IS STILL_ listed.

LocalSettings.php _after the lines that say # Add more configuration options below_

require __DIR__ . "/Debug.php";
require_once "$IP/extensions/SemanticMediaWiki/SemanticMediaWiki.php";
enableSemantics( 'smw-ar-sesp' );
wfLoadExtension( 'ApprovedRevs' );
wfLoadExtension( 'ParserFunctions' );
wfLoadExtension( 'SemanticExtraSpecialProperties' );
$sespSpecialProperties = [  "_EUSER", "_CUSER", "_REVID", "_PAGEID", "_PAGELGTH", "_NREV", "_NTREV", "_SUBP", "_USERREG", "_USEREDITCNT", "_USERBLOCK", "_USERRIGHT", "_USERGROUP", "_EXIFDATA", "_APPROVED", "_APPROVEDBY", "_APPROVEDDATE", "_APPROVEDSTATUS" ];
$egApprovedRevsAutomaticApprovals = false;
$egApprovedRevsNamespaces = [];
$wgGroupPermissions["*"]["approverevisions"] = true;
$wgGroupPermissions["*"]["viewlinktolatest"] = true;
$wgGroupPermissions["*"]["viewapprover"] = true;
$wgJobRunRate = 0;

Pages used:

Template:Approve

   <noinclude>This template is used to add the APPROVEDREVS to articles.</noinclude>

   <includeonly>
   [[Special:Browse/{{FULLPAGENAME}}|Examine SMW properties for this page]]
   __APPROVEDREVS__
   {{#if:{{#show:{{FULLPAGENAME}}|?Approved revision}}
        |{{#ifeq:{{#show:{{FULLPAGENAME}}|?Approved revision}}|{{#show:{{FULLPAGENAME}}|?Revision ID}}
                |[[Category:Approved pages]]|[[Category:Pages pending approval]]
          }}
        |[[Category:Pages with no approved revisions]]
        }}[[Category:Pages with approval process]]
   </includeonly>

Template:Category_action_items

   <noinclude>This template serves as the header on relevant categories.</noinclude>
   <includeonly>
   '''NOT approved'''
   {{#ask:
   [[Category:{{PAGENAME}}]]
   [[Category:Pages with no approved revisions]]
   |format=json
   }}
   {{#ask:
   [[Category:{{PAGENAME}}]]
   [[Category:Pages with no approved revisions]]
   |format=ol
   |userparam=notreviewed
   |template=Format
   }}
   '''Modification NOT approved'''
   {{#ask:
   [[Category:{{PAGENAME}}]]
   [[Category:Pages pending approval]]
   |?Modification date
   |format=json
   }}
   {{#ask:
   [[Category:{{PAGENAME}}]]
   [[Category:Pages pending approval]]
   |?Modification date
   |format=ol
   |userparam=mod
   |template=Format
   }}
   '''Completely reviewed'''
   {{#ask:
   [[Category:{{PAGENAME}}]]
   [[Category:Approved pages]]
   |format=json
   }}
   {{#ask:
   [[Category:{{PAGENAME}}]]
   [[Category:Approved pages]]
   |format=ol
   |columns=2
   |userparam=reviewed
   |template=Format
   }}
   </includeonly>

ApproveTestShowRevs

{{Approve}}
This page tests AR when the AR revision is queried on the page

Approved rev from SMW: {{#show:{{FULLPAGENAME}}|?Approved revision}}

This revision: {{#show:{{FULLPAGENAME}}|?Revision ID}} 

ApproveTestRevsNotShown

{{Approve}}
This page tests AR when the AR revision is not shown

This revision: {{#show:{{FULLPAGENAME}}|?Revision ID}} 

Category:Pages_with_approval_process

{{Category_action_items}}

Most helpful comment

@hexmode BTW, I merged a related PR last week on the proposed changes to master.

All 20 comments

I just had a quick peek as I'm fairly busy now (and the next couple of weekend sprints) to finish up the 3.0 release.

Skimming the example I could see the following constraints:

  • {{#show:{{FULLPAGENAME}}|?Approved revision}} uses self-reference queries
  • ApprovedRevs extension which may or may not influence the result beyond the reach of SMW core (see #3032)
  • Relying on SESP to store ApprovedRevs specific annotations

@kghbln Can we establish the fact the using of a template and SMW works as expect without any self-reference and ApprovedRevs.

{{#if:{{#show:{{FULLPAGENAME}}|?Approved revision}}

This uses a self-reference which has restrictions as to when results are available. (see also #3171)

As outlined in [0] MW moved the storage execution point further down [1, 2] in the stack call to after what is now called MediaWiki::preOutputCommit hereby making all DB storage activities to be executed after an output has been build which is the reason why you can effectively only query data from an embedding page after the storage has been completed. In older MW versions the execution was closer to the parsing process and might have appeared immediately. Changes to the refreshLinks [3] job execution adds another factor as to when the parser cache is invalidated.

We do sent a Title::invalidateCache (responsible for the invalidation of the parser cache) at the end of a storage process to MW but also here the execution point has changed considerably in recent versions of MW.

With the execution of #ask happening before [0] SMW gets the signal to store the data from a page (== altered or new annotations including those defined using subobjects) it should be apparent that the query cannot display something it has no "knowledge" about.

SMW cannot influence the timing of when #ask is called from the internal parser call stack nor can it influence when it gets the signal [2] to store its data to the backend.

SMW 3.0 introduced @annotation to mark self-referencing queries to use a post-processing the force an update of the page to circumvent the situation of query > storage > query and @deferred to support some formats to execute queries in a so called deferred mode involving the MW API and JS where a placeholder is created at the time MW is building the HTML. The actual query execution is deferred to after a page has been loaded by the browser (the point where a user actually sees the page content and data storage has happened) where the query output is no longer part of the standard parser cache therefore results are independent of the static HTML and are generated dynamically on each page view using the MW API.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/2375
[1] https://www.semantic-mediawiki.org/wiki/Help:Deferred_updates
[2] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/MediaWiki/Hooks/LinksUpdateConstructed.php
[3] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/MediaWiki/Hooks/LinksUpdateConstructed.php#L111-L114

It looks like you've got a pretty good handle on this. @cicalese should be made aware of this since you think part of the issue is recent changes in MW.

Reproduction of the issue without AR would be good, agreed.

I have noticed starting a version or two back (maybe 1.29?) that when I save a new page, often the property values are not available until after a refresh. You can see this on mwstake.org with Template:Event if you create a new event - the date field will be empty until you refresh. It is a simplified version of the same use case you describe: a #show in a template. You do not need ApprovedRevs or SESP to reproduce.

create a new event - the date field will be empty until you refresh. It is a simplified version of the same use case you describe: a #show in a template. You do not need ApprovedRevs or SESP to reproduce.

As outlined above, if you use a self-referencing query then you cannot query something that has been stored by the time the #ask is executed. [0] has a schematic that shows the processing flow for a query and storage context.

You can follow (or debug) processing flow using the LinksUpdateConstructed and MediaWiki::preOutputCommit event in combination with the DeferredUpdates execution.

In some rare occasions where the refreshLinks job is enqueued and executed late, the Title::invalidateCache will invalidate the parser cache for a viewed page and hereby allows embedded parser functions (including #ask) to be called during a page view.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/2375#issue-218690124

Right. I was just confirming that this is indeed observed in the simpler situation without ApprovedRevs and SESP and that, to the best of my knowledge, it used to work.

to the best of my knowledge, it used to work.

To the best of my knowledge that was before updates where posted in the DeferredUpdates and LinksUpate was made an EnqueueableDataUpdate.

Unless I'm mistaken, changes to the update process were introduced to discourage active master writes before the MediaWiki::preOutputCommit event which came as a result of a multi datacenter strategy.

Generally, we want the parser cache to be active otherwise each page view would cause a new parse and potentially floods the DB with SELECT requests during a view.

For updates of queries that are not classified as self-referencing, the querylinkstore tracks the entity usage of queries allowing an update to queue a purge job to invalidate the cache of those connected pages and hereby enable a new parse on the next page view to fetch fresh results.

Can the situation be improved? From the top of my head, I'd say:

  • Detect that a #ask request (on edit) is self-referencing
  • For those self-referenced queries, mark the ParserOutput/ParserOption to neglect the parser cache for the upcoming storage/HTML build
  • With a missing parser cache, the next page view (GET request) will force the execution of all embedded parser functions incl. #ask/#show but this time (on view) make sure that any #ask (those that carry the self-reference) does not disable the parser cache because the next page view should rely on the parser cache (see above)

To help those that want to try implement something the following contains a starting point for the simple form of #ask: [[{{FULLPAGENAME}}]] ... and #show: {{FULLPAGENAME}} ....

Based on SMW 3.0!

Detect that a #ask request (on edit) is self-referencing

@@ -273,10 +273,17 @@ class AskParserFunction {

+       // [[ ... ]] construct
+       if ( $query->getDescription() instanceof \SMW\Query\Language\ValueDescription ) {
+
+           $action = $this->parserData->getOption( 'request.action' );
+           wfDebugLog( 'smw', $action );
+
+           // Self-referencing query? 
+           if ( $query->getDescription()->getDataItem()->equals( $contextPage ) && ( $action === 'submit' || $action === 'stashedit' ) ) {
+               // Do something here because the query contained [[{{FULLPAGENAME}}]]
+               wfDebugLog( 'smw', 'SETTING EXTRA KEY' );
+               $this->parserData->addExtraParserKey( 'smw-update' );
+           }
+       }

On my test system the debug would look like:

...
2.8996  12.0M  MediaWiki::preOutputCommit: LBFactory shutdown completed
2.9265  12.0M  Request ended normally
[session] Saving all sessions on shutdown
[DBConnection] Closing connection to database 'localhost'.
[smw] [Connection] 'mw.db.queryengine': [read:-1, write:-2]
[smw-elastic] Connection: ["elastic : [{"host":"192.168.99.100","port":9200,"scheme":"http"}]"]
[smw-elastic] Indexer: ["Data index completed (SelfRefQ#0##)","procTime (in sec): 0.05209","Response: {"took":25,"errors":false}"]
[smw] ElasticStore: ["Data update completed","procTime in sec: 0.1624"]
[smw-elastic] Text indexer: ["Upsert completed (SelfRefQ#0##)","procTime (in sec): 0.07574","Response: {"took":57,"errors":false}"]
[smw] ElasticStore: ["Text update completed","procTime in sec: 0.09412"]
[smw] [Store] Update completed: SelfRefQ#0## (procTime in sec: 0.58807)
[smw] [TransactionalCallableUpdate] Added: ["SMW\\MediaWiki\\PageUpdater::pushUpdate","SMW\\Store::updateData"] (onTransactionIdle)
[smw] [CallableUpdate] Update completed: ["SMW\\DataUpdater::doUpdate","LinksUpdateConstructed","SelfRefQ#0##"] (fingerprint:[Null])
[DBQuery] ["SMW\\DataUpdater::doUpdate","LinksUpdateConstructed","SelfRefQ#0##"]: committing on behalf of SMW\MediaWiki\Deferred\TransactionalCallableUpdate::doUpdate.
[smw] [CallableUpdate] Added: {
    "origin": [
        "SMW\\MediaWiki\\PageUpdater::pushUpdate",
        "SMW\\Store::updateData"
    ],
    "fingerprint": "d41d8cd98f00b204e9800998ecf8427e",
    "stage": "post",
    "transactionTicket": null
}
[smw] [CallableUpdate] Update completed: ["SMW\\MediaWiki\\PageUpdater::pushUpdate","SMW\\Store::updateData"] (fingerprint:d41d8cd98f00b204e9800998ecf8427e)

...

0.4292   2.0M  Start request GET /mw-30-00-elastic/index.php/SelfRefQ
HTTP HEADERS:
HOST: 127.0.0.1:8080
CONNECTION: keep-alive
CACHE-CONTROL: max-age=0
UPGRADE-INSECURE-REQUESTS: 1
...
[caches] parser: RedisBagOStuff
1.2849   2.0M  Article::view using parser cache: yes
[ParserCache] Parser options key expired, touched 20180721210509, epoch 20180721205645, cached 20180721210507
1.3439   2.0M  Article::view: doing uncached parse
1.3644   2.0M  Parser cache options found.
1.3644   2.0M  Parser cache options found.
[smw-elastic] Connection: ["elastic : [{"host":"192.168.99.100","port":9200,"scheme":"http"}]"]
[smw] [Connection] 'mw.db': [read:-1, write:-2]
[smw-elastic] SEARCH: {"index":"smw-data-mw-30-00-elastic","type":"data","body":{"_source":false,"from":0,"size":51,"query":{"constant_score":{"filter":{"bool":{"must":{"terms":{"_id":[334068]}}}}}},"sort":[{"subject.sortkey.sort":{"order":"asc"},"subject.title.sort":{"order":"asc"}}]}}, queryTime: 0.04082
...

The important part is that the storage happens before the GET where Article::view: doing uncached parse indicates that a new parse is requested instead of a Article::view: showing parser cache contents.

@hexmode Please provide feedback on the comments/suggestions I made in https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3230#issuecomment-406824138.

@mwjames I didn't respond because you have a much better grasp of this than I do and I was trusting your judgement.
However, I went and looked at the code you were talking about and confirmed that the messages indicate, as you said, that output indicates that your the parser can skip the cache if needed.

Thanks!

Hi,

@mwjames I didn't respond because you have a much better grasp of this than
I do and I was trusting your judgement.

Well, the feedback I was hoping for was less in regards to "my
judgement" and more about the usage of the example where you go about
and modify (given you use SMW 3.0) your environment to see if it makes
a difference.

As I pointed out, I very thin on providing volunteer time so I rely on
people to make an effort and provide sufficient feedback. I scripted
the example so that experienced users can apply the proposal and
gather results on whether it goes in the right direction or not.

MW version: 1.31.0

I anticipated users ask for MW 1.31 support and I explained in #3163
the issue with it which is the reason why locally I do not run MW 1.31
and I don't see myself working with it anytime soon because I'm unable
to run our tests on it.

Cheers

On 7/31/18, Mark A. Hershberger notifications@github.com wrote:

@mwjames I didn't respond because you have a much better grasp of this than
I do and I was trusting your judgement.
However, I went and looked at the code you were talking
about

and confirmed that the messages indicate, as you said, that output indicates
that your the parser can skip the cache if needed.

Thanks!

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3230#issuecomment-409028919

Thank you for clarifying what you want.

I can appreciate that you don't have a lot of time for this, and I'm willing to test your proposed solution.

I would test it now, but I don't see any of the code you mentioned in AskParserFuction.php.

I would test it now, but I don't see any of the code you mentioned in
[AskParserFuction.php]

Well, [0] contains the code snippet that should be added at around
line 273 to the AskParserFunction.php.

The extra wfDebugLog is just for a controlling purpose to monitor
the debug log and see where and when something in regards to the added
code path happens.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3230#issuecomment-406824138

Cheers

On 8/1/18, Mark A. Hershberger notifications@github.com wrote:

Thank you for clarifying what you want.

I can appreciate that you don't have a lot of time for this, and I'm willing
to test your proposed solution.

I would test it now, but I don't see any of the code you mentioned in
AskParserFuction.php.

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3230#issuecomment-409360459

First, Here is the patch I used:

diff --git a/src/ParserFunctions/AskParserFunction.php b/src/ParserFunctions/AskParserFunction.php
index 48681ff6a..64cb79219 100644
--- a/src/ParserFunctions/AskParserFunction.php
+++ b/src/ParserFunctions/AskParserFunction.php
@@ -293,6 +293,20 @@ class AskParserFunction {
            return '';
        }

+       // [[ ... ]] construct
+       if ( $query->getDescription() instanceof \SMW\Query\Language\ValueDescription ) {
+
+           $action = $this->parserData->getOption( 'request.action' );
+           wfDebugLog( 'smw', $action );
+
+           // Self-referencing query? 
+           if ( $query->getDescription()->getDataItem()->equals( $contextPage ) && ( $action === 'submit' || $action === 'stashedit' ) ) {
+               // Do something here because the query contained [[{{FULLPAGENAME}}]]
+               wfDebugLog( 'smw', 'SETTING EXTRA KEY' );
+               $this->parserData->addExtraParserKey( 'smw-update' );
+           }
+       }
+
        QueryProcessor::setRecursiveTextProcessor(
            $this->recursiveTextProcessor
        );

Next, I walked through my steps above down to this step:

 purge “Category:Pages with approval process” and verify no change (2x)

Only one purge was necessary and I saw the change then.

This is much better than the double purge (first on the subject, then on the category page) that I had to do before.

Still, I wonder if I could eliminate even that one.

I would say it works! Thanks, @mwjames !

Only one purge was necessary and I saw the change then.

This is much better than the double purge (first on the subject, then on the
category page) that I had to do before.

We have to distinguish the scenario where a page contains a self-reference and query results to display altered values after the storage (static pages like the one cited (category) where results are to be updated with results from dependent pages) from linked entities.

In both cases, the parser cache is key but:

  • For the first case the invalidation of the parser cache is to force an immediate reparse on a GET request so that newly stored values can be fetched (see explanation above the reason)
  • The second requires a different approach where linked entities (those included in a query result) are tracked and purged based on its graph dependency [0] in relation to the conditional use of those entities in a query result.

[0] https://www.semantic-mediawiki.org/wiki/Help:Embedded_query_update

Thanks for your help and this explanation. I'm assuming the changes you've proposed for AskParserFunctions will be in 3.0?

(I looked at the embedded query update earlier, hoping it would clear up my problems, but the bug persisted until your fix.)

Thanks for your help and this explanation. I'm assuming the changes you've
proposed for AskParserFunctions will be in 3.0?

I added this to my list of things that should be done before 3.0.

On 8/12/18, Mark A. Hershberger notifications@github.com wrote:

Thanks for your help and this explanation. I'm assuming the changes you've
proposed for AskParserFunctions will be in 3.0?

(I looked at the embedded query
update

earlier, hoping it would clear up my problems, but the bug persisted until
your fix.)

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/3230#issuecomment-412281448

@hexmode BTW, I merged a related PR last week on the proposed changes to master.

I merged a related PR last week on the proposed changes to master.

Closing this issue on the assumption that the related PR did address the issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

simontaurus picture simontaurus  Â·  3Comments

mwjames picture mwjames  Â·  3Comments

akuckartz picture akuckartz  Â·  3Comments

SteveRMann picture SteveRMann  Â·  4Comments

alex-mashin picture alex-mashin  Â·  4Comments