Semanticmediawiki: Programatically storing structure data

Created on 22 Feb 2020  路  6Comments  路  Source: SemanticMediaWiki/SemanticMediaWiki

The following is a response to an email I received, I'm answering it here to ensure the information is appropriately disseminated and can be referenced to.

Requiring the user to define all properties is not nice. Properties could be created automatically,
though they would need to be checked on every write, since new properties might be required. Creating property pages might also override existing properties and they'd allow users to change the data type (hello silent save fail) or add restrictions which make no sense.

There is a reason why we have predefined and user-defined properties because those predefined cannot be manipulated by users and if a user is changing the type of a user-defined property then the change propagation is triggered to cast the new type and update all pending entities and make necessary changes to the appearance of the data associated with that property.

The general concept is that a property has a type (and when non is given the page type becomes the default) and that type dictates how data assigned to that property is stored or serialized which is an invariant fact independent of the storage system selected (SPARQL, SQL, or Elastic).

So, yes, a property needs a type which means you have to find a way to populate that type persistently before some data is assigned to a property.

The SMW code just was not designed for this. I'm thinking we probably need a new (PHP) API to add such data. But first we need to figure out what makes sense on a high level, such as how to deal with properties. Thoughts?

It might be but as I said above, the property you assign values to requires a type and that fact remains immutable. So, when trying to import data, the data have to be mapped to a specific type such as a page, text, number, geo etc. and prio the import you have to register/create/define properties with those types to ensure that values are stored in the correct table (or for SPARQL exported with the correct XSD type; the same goes for Elastic to ensure the correct field is used when matching values later during quering).

Before starting to import data, you have to lookup required properties and verify their types and register them with the predefined property system (using a special __xxx prefix) and then continue the import. If you find that your import changes an existing __xxx property then you have to trigger a change propagation accordingly so that "old" items assigned to the property will be updated.

documentation question

Most helpful comment

Is there any documentation on predefined properties?

Basically all extensions maintained in https://github.com/SemanticMediaWiki make use of predefined properties and while each extension uses a different approach to define their properties [0] vs. [1] they ultimately rely on the SMW::Property::initProperties hook [2] to register them before any object is able to make a reference to one of them.

Good to hear they cannot be modified by the user.

To be precise they are unable to modify the type but of course they can add a local property description, an import statement, or a category.

Do they have their own property pages at all?

Yes, there are handled in the same way as any other property hence are also displayed in the same way [3] (the example refers to the Modification date which is registered as _MDAT in SMW) and unless some user has created a page for this property representation, there is no real page in MediaWiki for something like [3] but of course it exists in the context of Semantic MediaWiki.

What would a good interface for storing structured data be?

I'll defer the answer to this question to a later point.

[0] https://github.com/SemanticMediaWiki/SemanticExtraSpecialProperties/blob/master/data/definitions.json
[1] https://github.com/SemanticMediaWiki/SemanticCite/blob/master/src/PropertyRegistry.php#L45-L130
[2] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/docs/examples/hook.property.initproperties.md
[3] https://sandbox.semantic-mediawiki.org/wiki/Attribut:Modification_date

All 6 comments

Is there any documentation on predefined properties? Good to hear they cannot be modified by the user. Do they have their own property pages at all?


What would a good interface for storing structured data be?

What about: it takes a list of properties (name and type info) and a list of subjects each with their own property value pairs. Property creating is taken care off automatically. Property value pairs that do not have a matching property or are of the wrong type get recorded as errors but do not cause the save to fail. In other words: users of the interface just need to provide a list of properties and a list of subjects with property value pairs.

Is there any documentation on predefined properties?

Basically all extensions maintained in https://github.com/SemanticMediaWiki make use of predefined properties and while each extension uses a different approach to define their properties [0] vs. [1] they ultimately rely on the SMW::Property::initProperties hook [2] to register them before any object is able to make a reference to one of them.

Good to hear they cannot be modified by the user.

To be precise they are unable to modify the type but of course they can add a local property description, an import statement, or a category.

Do they have their own property pages at all?

Yes, there are handled in the same way as any other property hence are also displayed in the same way [3] (the example refers to the Modification date which is registered as _MDAT in SMW) and unless some user has created a page for this property representation, there is no real page in MediaWiki for something like [3] but of course it exists in the context of Semantic MediaWiki.

What would a good interface for storing structured data be?

I'll defer the answer to this question to a later point.

[0] https://github.com/SemanticMediaWiki/SemanticExtraSpecialProperties/blob/master/data/definitions.json
[1] https://github.com/SemanticMediaWiki/SemanticCite/blob/master/src/PropertyRegistry.php#L45-L130
[2] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/docs/examples/hook.property.initproperties.md
[3] https://sandbox.semantic-mediawiki.org/wiki/Attribut:Modification_date

Basically all extensions maintained in https://github.com/SemanticMediaWiki make use of predefined properties and while each extension uses a different approach to define their properties [0] vs. [1] they ultimately rely on the SMW::Property::initProperties hook [2] to register them before any object is able to make a reference to one of them.

There is an important difference with the use cases I am thinking about. These extensions all have a fixed set of predefined properties. That would not be the case for what I have in mind. So it is not possible to create the properties on extension installation or extension initialization. Properties might also get added or removed.

There is an important difference with the use cases I am thinking about.

I'm aware of this but I only tried to provide some answers to questions raised. Now, if predefined properties seemed to be too rigid in their application then maybe something liked "locked properties" have to be introduced (... I'm deflecting here as this should be part of "What would a good interface for storing structured data be?").

Again, types of properties have to be stored in SMW to make them available outside of the context of any imported content otherwise any query that is trying to build a value representation from an imported property (non user-defined, non predefined) using a DataValue instance will fail.

Properties might also get added or removed.

As for predefined properties there are never removed, they remain part of the system once registered with an ID.

Preserved at "General" --> "Documentation".

There still is no good way to store data with properties that are not known at extension installation time.

What would be a good interface for storing such data be?

(Repeat of my earlier suggestion which was not replied to yet.) What about: it takes a list of properties (name and type info) and a list of subjects each with their own property value pairs. Property creating is taken care off automatically. Property value pairs that do not have a matching property or are of the wrong type get recorded as errors but do not cause the save to fail. In other words: users of the interface just need to provide a list of properties and a list of subjects with property value pairs.


@mwjames I expect to work on this soon. I'd be nice to get input from you beforehand.

Was this page helpful?
0 / 5 - 0 ratings