Id: Presets: Separate terms into synonyms and keywords

Created on 4 Apr 2019  Â·  13Comments  Â·  Source: openstreetmap/iD

I recently had a closer look at the data in /data/presets/ and I must say, it's pretty awesome. Such a wealth of information and so well structured, great job guys! I hope this would be used more in other OSM projects. (As a matter of fact, I am currently working on a (Java-)library that should serve as a tags<->names dictionary which will exclusively scrape it's data from here)

So, to the point, I have a suggestion that could improve searchability of the presets even more. Currently, there is the name and there are the terms. I suggest to split up terms into "further names" (synonyms) and keywords.

Advantages I see

  1. search matches with synonyms can get a higher priority than for keywords - possibly just below name and even before suggestions (brand names)
  2. for items for which the search matches a synonym, that synonym could be displayed in the result list instead of the primary translation/most-well-known one (which might, depending on the area, not actually be the most-well-known one). Currently, only that translation is shown (Spielbank is a synonym for Kasino in German): spielbank
    The current behavior is understandable because iD can't know whether a term in terms is actually a synonym or just a keyword (and a keyword shouldn't be displayed in the result list).

Dealing with current translations

An unproblematic migration in respect to the current translations could work like this: A new translatable field named synonyms is introduced (or name field being made into names) and all translations in terms (perhaps renamed to keywords for clarity) just stay where they are. The behavior does not change. Then, translators can gradually move terms that are actually synonyms into the other field to profit from the features the separation brings.

What do you think?

preset

Most helpful comment

I recently had a closer look at the data in /data/presets/ and I must say, it's pretty awesome. Such a wealth of information and so well structured, great job guys!

@westnordost On behalf of the project, thank you! Our presets are especially great because so many individuals have contributed to them. A new preset makes for an excellent first PR.

I hope this would be used more in other OSM projects. (As a matter of fact, I am currently working on a (Java-)library that should serve as a tags<->names dictionary which will exclusively scrape it's data from here)

Sounds terrific! Be aware that the preset schema can change fairly often to suit the needs of iD.

I suggest to split up terms into "further names" (synonyms) and keywords.

This is an interesting idea. We'll need to think about this more but here are my initial thoughts:

  • "Synonyms" as you outline them remind me of Wikidata aliases.
  • I think if a preset showed up with a different label I would assume it was a different preset altogether. There are already confusingly similar presets in iD (e.g. Casino vs. Adult Gaming Center) because OSM tagging is so complex. I'd like for preset names to remain stable in the UI.
  • We're planning on adding an optional subtitle property to show below the preset name (see #6137). It will also be used for search results and ranking. Perhaps this alone is an okay alternative?
  • Can you think of any other advantages? I personally find that the presets are already pretty searchable, do you disagree? I want to make sure we have a really compelling reason to do this since it could make a lot of work for translators.

All 13 comments

That would be great step towards improving the searchability of presets. +1

I recently had a closer look at the data in /data/presets/ and I must say, it's pretty awesome. Such a wealth of information and so well structured, great job guys!

@westnordost On behalf of the project, thank you! Our presets are especially great because so many individuals have contributed to them. A new preset makes for an excellent first PR.

I hope this would be used more in other OSM projects. (As a matter of fact, I am currently working on a (Java-)library that should serve as a tags<->names dictionary which will exclusively scrape it's data from here)

Sounds terrific! Be aware that the preset schema can change fairly often to suit the needs of iD.

I suggest to split up terms into "further names" (synonyms) and keywords.

This is an interesting idea. We'll need to think about this more but here are my initial thoughts:

  • "Synonyms" as you outline them remind me of Wikidata aliases.
  • I think if a preset showed up with a different label I would assume it was a different preset altogether. There are already confusingly similar presets in iD (e.g. Casino vs. Adult Gaming Center) because OSM tagging is so complex. I'd like for preset names to remain stable in the UI.
  • We're planning on adding an optional subtitle property to show below the preset name (see #6137). It will also be used for search results and ranking. Perhaps this alone is an okay alternative?
  • Can you think of any other advantages? I personally find that the presets are already pretty searchable, do you disagree? I want to make sure we have a really compelling reason to do this since it could make a lot of work for translators.

Can you think of any other advantages? I personally find that the presets are already pretty searchable, do you disagree?

I don't disagree, it is pretty well searchable and I am impressed, I just think that it may be improved if this is done.

Perhaps both the advantages I mentioned are more apparent with (at least the) German locale. The majority of terms on this locale are actually synonyms, very few keywords there. At least for me, it felt odd that for the thing I am searching for, the best match is not the text I entered and also not highlighted in any way.

Of course, that the search word actually matches a preset 100% because it is a synonym rather than a keyword could be shown in a different way, like with subtitles as you suggest. (Title: [Primary translation] Subtitle: a.k.a [matched Synonym], or the other way around)

Another cumulative idea is to highlight the section of the the matched word that matches the input text (in bold or underlined), so when searching for "Spiel", the results that actually contain the word are highlighted, i.e.

  • Spielbank
  • Spielwarengeschäft
    etc., like in Google:
    Bildschirmfoto 2019-04-04 um 12 19 00

This cumulative idea (shall I create a separate ticket?) would work better together with this idea because it would be confusing if the best match is the only match that is not highlighted.

I've been thinking more about this and I think we should do it. So far I've been using the subtitle property as a mixture of synonyms and short descriptions, but it'd be better to keep them separate.

I suggest we limit subtitle to descriptions, use terms for arbitrary internal search phrases, and add an aliases property as an array of display-ready strings in order of priority, like so:

"name": "Events Venue",
"subtitle": "Rentable facility for events like weddings and banquets",
"aliases": [
    "Event Space",
    "Wedding Venue",
    "Banquet Hall",
],
"terms": [
    "celebration",
    "party"
]

When a search only matches an alias, we could show the alias alongside the name and subtitle rather than as a replacement. (We may need an alternative design for long text.)

Screen Shot 2019-04-09 at 7 37 56 AM

The main benefit I see to this addition is that users can feel more confident with their choice of preset rather than wondering "Is this the same as the thing I'm trying to add?"

Another cumulative idea is to highlight the section of the the matched word that matches the input text

@westnordost This could be good, we'd have to see how it looks in practice. Feel free to create a separate issue!

Cool, that sounds great! I have a load of German aliases to dump into Transifex from an earlier attempt to localize primary features :-)

Where does the subtitle come from? taginfo displays a similar description, I think it comes from the wiki. An option also for iD?

Where does the subtitle come from? taginfo displays a similar description, I think it comes from the wiki. An option also for iD?

@westnordost We'll be adding the subtitle as a preset property native to iD in #6137. The wiki descriptions are unsuitable for iD since:

  • They require API calls to get them so we can't really display them instantly for every feature in the search results.
  • The don't all map directly to iD presets (which can be defined by multiple tags).
  • Many are longer and more detailed than we need here.
  • They are of varying quality.
  • They may not reflect how iD interprets a feature.

By the way, terms is currently a comma-separated string, I think this is for Transifex-reasons. So aliases should probably follow the same scheme. Except of course, if it is an option to convert this for consumption already when generating the final presets.json.

Also, maybe it would be better to not add a aliases translation key, but instead rename name to names. The first item in that list would then be the the one that is shown as the name, the others are the aliases.

Why?

As due to the nature of Transifex or any/most other translation portals, it encourages users to add something to every translation key because otherwise it is flagged as a missing translation.
So even if there is no real alias for a map feature, translators are pushed towards maybe even adding aliases that do not fit 100%. This is already a problem with the terms key. In the German translation, I often see the workaround that the name is copied into the terms field to just not have an empty translation.
This would be avoided when using the names field.

Additionally, what should be the primary name (instead of just an alias) could become the topic of smaller edit wars after the introduction of aliases. When the name+aliases are kept in one translation key, this could be defused a bit because then it just becomes a matter of rearranging the names (by importance/common-ness). Otherwise, it would involve deleting and replacing the name and hopefully adding that replaced name to the alias key in the same breath.

Also, maybe it would be better to not add a aliases translation key, but instead rename name to names. The first item in that list would then be the the one that is shown as the name, the others are the aliases.

An interesting idea, but I prefer keeping the name and the aliases separate. Using names would complicate a lot of the code in iD, plus it would entail more much work for translators compared to a purely additive change. The displayed completion percentage of a translation is ultimately arbitrary, but if it's really an issue we can consider a smarter solution like allowing an explicit "null" translation that iD will ignore.

Using names would complicate a lot of the code in iD, plus it would entail more much work for translators compared to a purely additive change.

iD’s build process already has to transform the strings it exports from Transifex into a suitable format at runtime, so could that transformation also partition “name” into “name” and “aliases”? (Leaving it as “name” despite the possibility of multiple lines of translations would avoid makework for translators.) The main challenge is communicating to translators that multiple names are allowed, but Transifex does show any instructions in comments in a prominent position.

The displayed completion percentage of a translation is ultimately arbitrary, but if it's really an issue we can consider a smarter solution like allowing an explicit "null" translation that iD will ignore.

The percentage may seem like a trifle from the perspective of the project’s developers, but it’s pretty much the only thing that motivates many translators. Seeing the percentage get stuck due to untranslatable strings can be discouraging.

The percentage may seem like a trifle from the perspective of the project’s developers, but it’s pretty much the only thing that motivates many translators. Seeing the percentage get stuck due to untranslatable strings can be discouraging.

@1ec5 Thanks for noting this, it's useful to hear. We'll come up with some way to allow 100% translation without encouraging bad translations.

I am now working on implementing this

iD's presets have been spun out to id-tagging-schema, so it seems time to revisit this feature.

I like the idea of allowing multiple values in the name field, and then putting everything after the first value into a separate aliases property, since it makes aliases optional for translators. Example: "name": "Indoor Corridor;Hallway;Passageway"

Was this page helpful?
0 / 5 - 0 ratings