Dataverse: Add metadata field for describing language of metadata record

Created on 30 Apr 2018  路  3Comments  路  Source: IQSS/dataverse

This would ideally support users who may use dataverse in a different language or who may enter metadata in a different language and would like that language to be tracked independent from the data or software.

from julian: From what I can tell so far, DataCite 3.1 schema lets you specify the language of Title, Subject and Description with the xml lang attribute (4.1 adds the xml lang attribute to Rights) - https://schema.datacite.org/meta/kernel-4.1/doc/DataCite-MetadataKernel_v4.1.pdf. The schema says it accepts only IETF BCP 47 and ISO 639-1 language codes. But I don't think Dataverse knows the ISO language codes for the languages it displays in the Citation block (I vaguely remember a comment about this in a github issue or maybe a Google Group post but can't find it). The Consorcio Madro帽o Dataverse does this with the DataCite metadata they publish for each dataset. Here's an example: https://edatos.consorciomadrono.es/api/datasets/export?exporter=oai_datacite&persistentId=doi%3A10.21950/O53TLR

And most or all of the DDI elements that Dataverse uses can include a lang attribute (http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/field_level_documentation_files/schemas/xml_xsd/attributes/lang.html). Looks like it accepts any value for now.

see related ticket https://github.com/IQSS/dataverse/issues/4633 about adding additional language/ translations for title, subject, and abstract fields in citation block

Internationalization Metadata UX & UI

Most helpful comment

Maybe a "default language for input" variable could be added to the account information for the user.

When specified in the account profile, this language could be pre-selected, when entering metadata within the Citation metadata block, from drop-down menus displayed alongside of those target fields only (Title, Description, Subject, Keyword, Notes).
The user could either change the value, if necessary, or add additional metadata in other languages.
No value would be pre-selected (default choice for drop-down) if default language for input wasn't specified in the user profile.
(I guess this means a value_lang column should be added to the datasetfieldvalue table with ISO 639-1/2/3(?) value)
(example from DSpace interface: https://jira.duraspace.org/secure/attachment/17500/language-tag.png )

All 3 comments

Thanks Amber! In our emails I was very focused on how to include this new language information in the metadata standards Dataverse uses now. A few other things to consider:

  • How best to add the field to the create and edit dataset form, so depositors can say "The metadata I'm entering is in this language"
  • Adding the specified languages to the html of each webpage where dataset metadata is displayed, like the search results page html and dataset page html? (W3C's guide on the language attribute)
  • How to make Dataverse know what the ISO language codes are for the languages that depositors choose? (If a depositor chooses French, Dataverse would add the value "fr" to the xml lang attribute)

Maybe a "default language for input" variable could be added to the account information for the user.

When specified in the account profile, this language could be pre-selected, when entering metadata within the Citation metadata block, from drop-down menus displayed alongside of those target fields only (Title, Description, Subject, Keyword, Notes).
The user could either change the value, if necessary, or add additional metadata in other languages.
No value would be pre-selected (default choice for drop-down) if default language for input wasn't specified in the user profile.
(I guess this means a value_lang column should be added to the datasetfieldvalue table with ISO 639-1/2/3(?) value)
(example from DSpace interface: https://jira.duraspace.org/secure/attachment/17500/language-tag.png )

I second this request as it is now even more relevant than it used to be for a great number of DDI producers, namely: members of the Consortium of European Social Science Data Archives (CESSDA). The CESSDA Metadata Management (CMM) working group produced guidelines for harmonizing metadata produced by CESSDA members, the Core Metadata Model 20191115_Core_Metadata_Model_v1_0.pdf,
and specifying the language of the content of various metadata fields is mandatory in this DTD.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

djbrooke picture djbrooke  路  3Comments

bsilverstein picture bsilverstein  路  3Comments

raprasad picture raprasad  路  5Comments

Fernand0S picture Fernand0S  路  4Comments

atrisovic picture atrisovic  路  3Comments