XML informs error at parsing:
<property:Datenqualität/Herkunft rdf:resource="&wiki;Der_Datensatz_wurde_basierend_auf_der_ÖK50-2C_Stand_2011_digitalisiert._Es_wurden_alle_Waldbestände_für_die_Gemeinde_Kopfing_erfasst."/>
Should we escape "/" by / or /or /? [0]
[0] https://en.wikipedia.org/wiki/Slash_(punctuation)#Encoding
<property:Datenqualität/Herkunft rdf:resource="&wiki;Der_Dat
As far as I can recall that should happen here [0].
Well, considering the age of the code it is at least not a regression for 3.0.0
Maybe it is not a blocker but, given sandbox is testing 3.0.0-rc.1, this issue is not over yet, look.
Maybe it is not a blocker but, given sandbox is testing 3.0.0-rc.1, this issue is not over yet, look.
https://sandbox.semantic-mediawiki.org/wiki/Sp%C3%A9cial:Export_RDF/Lorem_ipsum_Export
The issue here is with [[Datenverantwortliche Stelle – E-Mailkontakt::[email protected]]] where the first – is not a normal dash but a Unicode symbol hence it is not recognized as dash in the ASCII format.
– as HTML entity is – and should be banned from being part of a property name. Adding it to smwgPropertyInvalidCharacterList should avoid creating properties that look a like but in fact are not such as Foo-Bar vs. Foo–Bar.
It should be noted that this is about the property name and not about any value representation, so adding some restrictions should help users and administrators instead of the "you can do all" motto.
You could try using htmlentities($uri, ENT_COMPAT, "UTF-8") in Escaper::encodeUri to filter invalid entities but I'm not so eager on doing that unless there is a very good reason for it.
Thanks for the explanation @mwjames
Added at different spots to make sure this does not get overlooked:
Maybe create a PR and add – to smwgPropertyInvalidCharacterList because a normal users cannot distinguish Foo-Bar from Foo–Bar by just looking at it (it took me some time and tools to figure out the true nature of the issue).
Maybe create a PR and add – to smwgPropertyInvalidCharacterList because a normal users cannot distinguish Foo-Bar from Foo–Bar by just looking at it (it took me some time and tools to figure out the true nature of the issue).
Doing something like this could be endless since there are many characters that are alike. However since I consider this a pretty popular pitfall at least for Germans I will to a pull tomorrow.
Most helpful comment
The issue here is with
[[Datenverantwortliche Stelle – E-Mailkontakt::[email protected]]]where the first–is not a normal dash but a Unicode symbol hence it is not recognized as dash in the ASCII format.