Jabref: BibTeXML vs. bibteXMP

Created on 11 Mar 2016  路  20Comments  路  Source: JabRef/jabref

JabRef 3.2

It seems that JabRef offers a second kind of XML serialization in BibTeX:

xmlns:bibtex='http://jabref.sourceforge.net/bibteXMP/'

IMHO, it is not worth to keep two different XML Schemas for an XML serialization of BibTeX. AFAIK, there isn't even one for JabRef's XML. Therefore, I propose that we should use BibTeXML only and migrate old XMP meta data to the BibTeXML format.

XMP examples can be found at https://github.com/JabRef/jabref/blob/fc82796f1f99c00a01ccdb9dd4fff792a40b7e75/src/test/java/net/sf/jabref/logic/xmp/XMPUtilTest.java#L139.

enhancement

All 20 comments

Refs #898

I'm not really following the argumentation. One may argue of different export formats, but how is it relevant that they are both XML? Isn't it more of an issue if it is a relevant format in itself?

Context: The format is used for storing BibTeX data in XML files using the XMP functionality (follow net.sf.jabref.logic.xmp.XMPSchemaBibtex). This PDF meta data is used by other people to exchange PDFs with the correct bibliographic data without being forced to send the bib entry along with the PDF in two files.

I am arguing that JabRef uses a proprietary format which is not used elsewhere. Thus, our XMP data cannot be processed by other software. I see the point, that the last commit at the current BibTeXML repository is from 2011. Nevertheless, I vote for joining forces. These formats are too similar to go into different directions.

I see following alternatives:

  1. Replace JabRef's bibteXMP by the canonical bibtex representation
  2. Completely use RDF. There seem multiple BibTeX2RDF converters available: https://www.w3.org/wiki/ConverterToRdf#BibTex
  3. Maybe, OWL is also an option: http://zeitkunst.org/bibtex/0.1/
  4. Move to BiBTeXML (as outlined in the original issue)
  5. Use MODS
  6. Keep everything as is

Somehow, the current code seems to use "Dublin Core", which reads good. Maybe, that code can just be used and the other serialization using {http://jabref.sourceforge.net/bibteXMP/}bibtex can be removed completely. Needs to be investigated further.

In case everything is replaced by Dublin Core, one can update PDFBox - see https://github.com/JabRef/jabref/pull/1096.

Ah, OK, so bibteXMP is JabRef's own format? Then it clearly makes more
sense so not support exporting in that.

The question would be: How many people actually use the XMP feature?
From my point of view I would suggest supporting the BibteXML Format and maybe add the RDF/OWL stuff as an addition.
Interestingly there is also a Paper about BibtexML:
https://www.researchgate.net/publication/2564256_BIBTEXML_An_XML_Representation_of_BIBTEX

From a quick look at the Code you referenced, I saw that it uses rdf-Tags...:confused:

The XMP feature is _the_ central tool to distribute PDFs with bibliographic information. I learned it from Adrian Daerr (possibly @adriandaerr?).

I am also confused by the code and also had the strange feelings about nesting JabRef's bibtexml into rdf tags. Therefore, I proposed to focus on Dublin Core (see above).

thanks for inviting me to the discussion! the BibTeXML we developed and implemented (http://dret.net/netdret/publications#wil01e) is a different one than the sourceforge repo. the paper is from 15 years ago, and while we used the language in a later project (http://dret.net/projects/sharef/), the software produced by that project is not really used anywhere, as far as i can tell. i did hand the sources to some people who liked it and wanted to have a bibtex-xml converter, but i don't think anybody ever made their versions public. i think our XML schema was pretty well-desgined, but it's something i haven't looked at in quite a while.

Either format you prefer to embed in PDF, would be great if it is compatible with PDF/A compliance checks.
JabRef 2.x embeds caused errors like:

XMP metadata property used, which is not predefined in the XMP specification of January 2004. There is no XMP extension schema present in the PDF defining the use and contents of this property. Some PDF-based ISO standards require that all XMP metadata properties are either predefined or defined in an embedded extension schema.

If it will be format like BibteXML, that can be exported in xml it would be also great to have some minimal example for correct embedding it through latex with xmpincl or hyperxmp packages. Use case: compiling thesis with embeded metadata precomposed with JabRef.

After dealing with this in #1096 I think the most portable solution would be to drop the JabRef bibteXMP and to encode everything into Dublin core (which we already do on top of our custom serialization).

That is, if we do not decide to drop the XMP functionality completely.

Some info about correct storage of xmp inside pdf (to be compatible with pdf/a for example) can be found with samples at http://www.pdflib.com/knowledge-base/xmp-metadata/xmp-in-pdfa/
Here goes free xmp validator: http://www.pdflib.com/knowledge-base/xmp-metadata/free-xmp-validator/
Some java code samples can also be found at that website.

Idea (as discussed with @hummelriegel): Add bibtexs of cited entries to the PDF. This is especially useful for a self-written paper.

Further options include bibtexml and MODS. I think, dublin core is still the way to go as it is standards-based. We should go in this direction.

Hi guys. I am not developer. I am just another user. I really hope that you maintain the XML feature. This one of the most important unique feature of Jabref that keep me come back time and time again (after using great reference manager like Bookends). The XML is useful not just for sharing Pdf files. Embedding the information into the Pdf is very useful for powerful search tools like Deveonthink[Mac], Spotlight[MaC], dtSearch[Windows]. With the embedded data, it is possible to search Pdf files by their author, title and the like data. In addition, re-generating the Jabref library from the pdf files (incase the library is corrupted or deleted) is possible with the embedded data. I had a couple of cases where my pdf files get dissociated from the reference. I drag them back. Voil脿, I have the whole reference. This is just so great.

Hi dellu. Thanks for the praise! And no worries, we have no intentions of removing support for this feature. Quite the contrary, we would like to update and improve it. Unfortunately, this has so far failed due to issues in the libraries that we use for this functionality. As a result, I assume that there will be no significant changes here in the near future.

Thank you @lenhard. I am glad you are going to keep the feature.

What do you guys think of this ?
They also write the metadata into the file using ExifTool. They use the standard bibtex tags. The standard Bibtex is nice.

@Dellu

Interesting link, thanks! Unfortunately, it will not be easy to interact with that tool or the ExifTool. The former is written in C++ and the latter in Perl, whereas JabRef is written in Java. There is always a way around the language differences, but in my point of view we should stick to the Java ecosystem and build a JabRef where everything is closely integrated and without language-related friction.

Other developers might have a different opinion, though.

Together with @snisnisniksonah I am investigating whether we can use Dublin Core.

Current steps:

  1. Read/write PDF annotations using Dublin Core using PDFBox 2.x (refs https://github.com/JabRef/jabref/pull/1096)
  2. Extract command line tool to convert old PDF annotations to the new format (Refs https://github.com/JabRef/jabref/pull/266#issuecomment-151827312) -> XMPUtil will released separately.

Results:

  • JabRef 4.x depending on PDFBox 2.x
  • XMPUtils depending on PDFBox 1.x

Nice! I think the XMPUtil is not that important since in most cases you can just write the information again to the PDF using Dublin Core and thus overwriting / "converting" the old XMP data.

Note to self: Do not forget https://github.com/JabRef/jabref/issues/938#issuecomment-232652848. pdflatex can easily do that: authorarchive. Check the example PDF.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Codeberg-AsGithubAlternative-buhtz picture Codeberg-AsGithubAlternative-buhtz  路  32Comments

JoKalliauer picture JoKalliauer  路  146Comments

nswitte picture nswitte  路  39Comments

wujastyk picture wujastyk  路  37Comments

glennib picture glennib  路  34Comments