When it comes to adding custom metadata block, there is a manual step involved: adding Solr fields to schema.xml, however Solr provides an API (Schema API) to make this manual step unnecessary, however it invloves some changes in Dataverse.
Note: this ticket is about investigation instead on implementation. First we have to understand every aspect of this change to make sure the existing technologies are reliable and fully support the request.
Solr has a Schema API, which lets you to modify the Solr schema (the list of fields and their properties). Solr can handle the schema in two different ways, and it can be controlled in the solrconfig.xml file. There is a "classic" way, which is based on schema.xml file, and a newer way, called managed schema (its materialization is the "managed-schema" file, and it is editable via the Solr user interface or via API, but it is not advised to edit this file manually).
In the Dataverse provided solrconfig.xml you have this:
The schema API doesn't work with the ClassicIndexSchemaFactory. If you try, Solr returns an error message: "schema is not editable". To enable Schema API, we have to change this setting:
Set ManagedIndexSchemaFactory in solrconfig.xml:
<schemaFactory class="ManagedIndexSchemaFactory"/>
After this you have to restart Solr, and the Schema API will work this way:
curl -X POST -H 'Content-type:application/json' \
http://localhost:8983/api/cores/collection1/schema --data-binary '{
"add-field":{
"name":"title", "type":"text_en", "multiValued":false,
"stored":true, "indexed":true
},
"add-copy-field":{"source":"title", "dest":"_text_", "maxChars":"3000"}
}'
The details of the Schema API can be found here:
https://lucene.apache.org/solr/guide/7_3/schema-api.html
The details of change from classic schema:
https://lucene.apache.org/solr/guide/7_3/schema-factory-definition-in-solrconfig.html#SchemaFactoryDefinitioninSolrConfig-Switchingfromschema.xmltoManagedSchema
The problems:
The documentation says: "Once Solr is restarted and it detects that a schema.xml file exists, but the managedSchemaResourceName file (i.e., “managed-schema”) does not exist, the existing schema.xml file will be renamed to schema.xml.bak and the contents are re-written to the managed schema file." When I tried it, the schema.xml were not copied, and not renamed. However since the same searches, even fielded searches are working.
When I use Schema API to retrieve fields, it contains only the default Solr fields, and not those Dataverse added via schema.xml.
I asked help from a Solr expert.
(I added @4tikhonov as watcher)
@pkiraly thanks for opening this issue! Creating fields in Solr programmatically via API would be a huge improvement over what we do know, which is to manually update schema.xml from time to time.
As you say, it would be especially useful when custom metadata blocks are created. The documented procedure at http://guides.dataverse.org/en/4.15/admin/metadatacustomization.html#updating-the-solr-schema (screenshot below) if quite manual.

Please let me know if I can help at all. Thanks again.
Hi @pdurbin,
Yes, it's on the Roadmap of SSHOC DataverseEU project and we'll try to find a solution together. But probably someone already knows "how to unlock the closed door".
@pdurbin
It would be great help to figure it out if it generally doesn't do what is described in Solr manual or it is just for me (due to some confounding factors of my system configuration).
To do the experiment, do the following steps (on a test machine).
<schemaFactory class="ClassicIndexSchemaFactory"/>
and replace to this:
<schemaFactory class="ManagedIndexSchemaFactory"/>
schema.xml.bak file were created and if the Dataverse related fields have been copied to managed-schema file.@pkiraly one observation is that even before I do anything there's a managed-schema file at the following location:
/usr/local/solr/server/solr/collection1/conf/managed-schema
I made a copy of it with this:
cp -a /usr/local/solr/server/solr/collection1/conf/managed-schema /usr/local/solr/server/solr/collection1/conf/managed-schema.backup.pdurbin
I'm confirming if Solr is up or down with these:
curl http://localhost:8983/solr/collection1/schema/fields
systemctl status solr
Stopping Solr with this:
systemctl stop solr
Backing up the file before editing:
cp -a /usr/local/solr/server/solr/collection1/conf/solrconfig.xml /usr/local/solr/server/solr/collection1/conf/solrconfig.xml.backup.pdurbin
Start Solr again:
systemctl start solr
A file named schema.xml.bak was not created. Nothing from this:
find /usr/local/solr | grep bak
No change to /usr/local/solr/server/solr/collection1/conf/managed-schema . This diff shows no changes:
diff /usr/local/solr/server/solr/collection1/conf/managed-schema /usr/local/solr/server/solr/collection1/conf/managed-schema.backup.pdurbin
I asked @erikhatcher a well known Lucene/Solr contributor, author and speaker.
He suggested that if this issue occurs, run the following procedure:
I tried it, and it works.
@joelmarkanderson would benefit from a solution. He recently reported the following at https://groups.google.com/d/msg/dataverse-community/lr26VTP8lhs/5JoZ-IdnBQAJ
"I have successfully populated a controlled vocabulary metadata block, and the list of 38 Values correctly shows under the "Add + Edit Metadata" configuration screen. However, selecting and saving a tag results in an webpage error message: "Error – The metadata could not be updated. If you believe this is an error, please contact Support for assistance."
...
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/collection1: ERROR: [doc=dataset_758_draft] unknown field 'tag' "
@pkiraly I have not yet tried your latest suggestion above. Mostly I'm just posting the error above so people can find this issue in the future.
@pkiraly heads up that as a stop gap measure @poikilotherm and I have cooked up a new plan in a new issue: Make Solr schema.xml configuration more flexible, still using Classic Schema Factory #6142
You (and others) are also welcome to read our conversation about it at http://irclog.iq.harvard.edu/dataverse/2019-09-05#i_104531
To be clear, Oliver and I and others still want the solution you are proposing in this issue. We just think the proposal in the other issue will be less effort. It's a short term solution. Your idea (this issue) is the longer term solution. :smile:
Thanks @pdurbin for cross-referencing :+1: As you already outlined, I'm totally with you @pkiraly that it makes sense to switch to managed schema factory in the long run.
Is there anybody who could reproduce the process I suggested (see my comments https://github.com/IQSS/dataverse/issues/5989#issuecomment-508365438 and before that)?
@pdurbin Do you have some label for "help needed"? I do not have right to add labels.
@pkiraly I have not tried playing with Managed Schema. I can add some "help wanted" labels.
Maybe @poikilotherm can help? I believe that the future pull request will be a doc change and some scripts that add fields to the Solr schema dynamically based on the fields metadata blocks that have been loaded into Dataverse.
By the way, if anyone wants a real custom metadata block to play with, a new one called "codemeta.tsv" is attached to the "CodeMeta-Metadata for Software and displayFormat for controlledVocabularies" thread at https://groups.google.com/d/msg/dataverse-community/nDMbMv4fKf4/P5YxHJzDBgAJ
Most helpful comment
I asked @erikhatcher a well known Lucene/Solr contributor, author and speaker.
He suggested that if this issue occurs, run the following procedure:
I tried it, and it works.