Semanticmediawiki: "rebuildData.php" emitting page content rather than page title

Created on 8 Nov 2016  路  12Comments  路  Source: SemanticMediaWiki/SemanticMediaWiki

Setup and configuration

  • MediaWiki 1.28.0-rc.0 (7625c75) 15:14, 3. Nov. 2016
  • PHP 5.6.27-0+deb8u1 (apache2handler)
  • MariaDB 10.0.28-MariaDB-1~jessie
  • Semantic MediaWiki 2.4.1 (20996e2) 00:54, 7. Nov. 2016

Issue

When running "rebuildData.php" on sandbox sometimes the "page content" rather than the page title is emitted:

Example one

Script output:

(3460/1640)     Finished processing ID 3461 (Utilisateur:Lalquier)
(3462/1640)     Finished processing ID 3463 (This_page_supports_semantic_in-text_annotations_(e.g._"Is_specified_asWorld_Heritage_Site")_to_build_structured_and_queryable_content_provided_by_Semantic_MediaWiki._For_a_comprehensive_description_on_how_to_use_annotations_or_the#0#ask_parser_function,_please_have_a_look_at_the_getting_started,_in-text_annotation,_or_inline_queries_help_page.This_page_supports_semantic_in-text_annotations_(e.g._"Is_specified_asWorld_Heritage_Site")_to_build_structured_and_queryable_content_provided_)
(3463/1640)     Finished processing ID 3464 (Lalquier#2#_QUERYf2efbfaa4ebf54f9a38b0e5c7b2c76dd)

ID lookup:

[
    3463,
    {
        "smw_title": "Lalquier",
        "smw_namespace": "2",
        "smw_iw": "",
        "smw_subobject": "_QUERYf2efbfaa4ebf54f9a38b0e5c7b2c76dd",
        "smw_sortkey": "Lalquier"
    }
]

Wiki page: see here

Example two

Script output:

(3842/1640)     Finished processing ID 3843 (SEA:Internal#0#_QUERYc62ef87cee27cd0afd3f6a2ad07efd95)
(3843/1640)     Finished processing ID 3844 (This_page_supports_semantic_in-text_annotations_(e.g._"Is_specified_asWorld_Heritage_Site")_to_build_structured_and_queryable_content_provided_by_Semantic_MediaWiki._For_a_comprehensive_description_on_how_to_use_annotations_or_the#0#ask_parser_function,_please_have_a_look_at_the_getting_started,_in-text_annotation,_or_inline_queries_help_page.This_page_supports_semantic_in-text_annotations_(e.g._"Is_specified_asWorld_Heritage_Site")_to_build_structured_and_queryable_content_provided_)
(3844/1640)     Finished processing ID 3845 (SubobjectTemplateLinkNone#0#_QUERY18ce294ba9208058a3e2d35798d8c299)

ID lookup:

[
    3844,
    {
        "smw_title": "SubobjectTemplateLinkNone",
        "smw_namespace": "0",
        "smw_iw": "",
        "smw_subobject": "_QUERY18ce294ba9208058a3e2d35798d8c299",
        "smw_sortkey": "SubobjectTemplateLinkNone"
    }
]

Wiki page: see here

This can perhaps be seen in context of https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/1963 though links in values are not configured for the wiki.

bug

All 12 comments

This can perhaps be seen in context of #1963 though links in values are not configured for the wiki.

I cannot really place this content therefore does the following query return some results?

SELECT * 
FROM  `smw_object_ids` 
WHERE  `smw_title` LIKE  '%This_page_supports_semantic_in-text_annotations%'
ORDER BY  `smw_object_ids`.`smw_iw` DESC 
LIMIT 0 , 30

I cannot really place this content therefore does the following query return some results?

See this Gist: https://gist.github.com/kghbln/d342edf390b9f48355992b72dba8d862

See this Gist: https://gist.github.com/kghbln/d342edf390b9f48355992b72dba8d862

This is good so we know there are no gost-pages. Those are left-overs from #1963 but the PropertyTableIdReferenceDisposer should catch them but I'm guessing that because they have a subobject attached, [0] comes into play.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/SQLStore/EntityRebuildDispatcher.php#L269-L271

[0] comes into play.

I trust your assessment here. Note that that these left-overs also get created when rebuilding data from scrap. Due to the switch to 2.4.x back and forth I always deleted the semantic backend and newly created the whole lot of data including the left-overs.

@mwjames Is this an issue that could/should somehow be addressed with a pull or so?

Is this an issue that could/should somehow be addressed with a pull or so?

I would have to find a way to replicate the issue locally, repeatedly and consistently to see what's causing it. During general testing, I didn't come across such issue therefore it be would be rather difficult for me to make time for a zero-point investigation.

Thanks for the info. Indeed the wikis showing this behaviour do not fall apart and the processes are not interrupted in any way. So there is something in the water but it is not top priority. Fair enough I believe. :)

This is still happening but not doing any harm for quite some time. So this one may be reopened in the future in case of worries.

[0] contains an analysis of what is happening for this particular case.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/2932#issuecomment-357508051

On 8/8/18, Karsten Hoffmeyer notifications@github.com wrote:

This is still happening but not doing any harm for quite some time. So this
one may be reopened in the future in case of worries.

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/2001#issuecomment-411102212

[0] contains an analysis of what is happening for this particular case.

Thanks for the pointer. Great! This issue cannot longer be observed!

[0] contains an analysis of what is happening for this particular case.
Thanks for the pointer. Great! This issue cannot longer be observed!

{{#set:Description=This page supports semantic in-text annotations (e.g. "Is specified asWorld Heritage Site") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help page.This page supports semantic in-text annotations (e.g. "Is specified asWorld Heritage Site") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help page.This page supports semantic in-text annotations (e.g. "Is specified asWorld Heritage Site") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help page.}}

@kghbln I was a bit unsatisfied with the analysis of the real cause of the issue, so let me reiterate.

The issue is twofold, the annotation to Description (which is a page type property) with a value containing # will be parsed as Title text with a fragment (and in case of SMW is identified as fake subobject). The parsing in itself doesn't do any harm and while # should be avoided in a page title (or value annotation), using it within SMW doesn't cause any issues even though it occupies the smw_subobject ID field.

The issue is that when SMW tries to find an ID for such entity it will use the entire string to match the smw_subobject field and in the above case exceeds the 255 char length restriction of MySQL/MariaDB.

SELECT /* SMWSql3SmwIds::getDatabaseIdAndSort */  smw_id,smw_sortkey,smw_sort 
FROM `smw_object_ids`
WHERE smw_title = 'This_page_supports_semantic_in-text_annotations_(e.g._\"Is_specified_asWorld_Heritage_Site\")_to_build_structured_and_queryable_content_provided_by_Semantic_MediaWiki._For_a_comprehensive_description_on_how_to_use_annotations_or_the' AND smw_namespace = '0' AND smw_iw = '' AND smw_subobject = 'ask_parser_function,_please_have_a_look_at_the_getting_started,_in-text_annotation,_or_inline_queries_help_page.This_page_supports_semantic_in-text_annotations_(e.g._\"Is_specified_asWorld_Heritage_Site\")_to_build_structured_and_queryable_content_provided_by_Semantic_MediaWiki._For_a_comprehensive_description_on_how_to_use_annotations_or_the_#ask_parser_function,_please_have_a_look_at_the_getting_started,_in-text_annotation,_or_inline_queries_help_page.This_page_supports_semantic_in-text_annotations_(e.g._\"Is_specified_asWorld_Heritage_Site\")_to_build_structured_and_queryable_content_provided_by_Semantic_MediaWiki._For_a_comprehensive_description_on_how_to_use_annotations_or_the_#ask_parser_function,_please_have_a_look_at_the_getting_started,_in-text_annotation,_or_inline_queries_help_page.'  LIMIT 1

Unfortunately above query will always return false on MySQL/MariaDB hence causes the ID request to create a new ID whenever it tries to match the string value for that annotation So, the duplicates observed are caused by an issue of long content field matches in MySQL/MariaDB.

Creating the same use case on Postgres and executing the same query doesn't create any duplicates for the said entity because Postgres has no field length restriction on the involved fields.

MySQL/MariaDB should find the ID even with a truncated content value on a restricted field but it doesn't, so we have to account for this.

Very interesting information. Thanks a lot for the elaboration which will help others too.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

simontaurus picture simontaurus  路  3Comments

seth2740 picture seth2740  路  3Comments

jaideraf picture jaideraf  路  3Comments

alex-mashin picture alex-mashin  路  4Comments

krabina picture krabina  路  4Comments