Jabref: Markdown comments dissapear on new start when having # chars

Created on 13 Oct 2020 · 13Comments · Source: JabRef/jabref

JabRef version 5.1--2020-08-30--e023aa0 on Ubuntu 20.04

Hi all!
I am trying to add comments in the comment tab to different entries to get myself an own insight of what I am reading. I've read that from version 5.1 on the markdown style is available. What I am trying to do is create sections in the comment: goal, achievement, method, failure... like this:

Goal (#### Goal)

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Achievement (#### Achievement)

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Method (#### Method)

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

I save the changes and everything looks to work correctly but everytime I reopen JabRef the # are gone and so the sections are not longer highlighted.

Goal Lorem ipsum dolor sit amet, consectetur adipiscing elit,
Achievement Lorem ipsum dolor sit amet, consectetur adipiscing elit,
Method Lorem ipsum dolor sit amet, consectetur adipiscing elit,

It is a pain to add the hashes everytime I reopen the software. Do you have any clue how to fix this?

entry-editor good first issue bug 🐛

Source

manurare

All 13 comments

Indeed, a basic markdown support was added (https://github.com/JabRef/jabref/pull/6232).
But maybe it is too basic: while markdown syntax such as **bold** is preserved, using the pound (#) character affects JabRef behavior. I believe this is because JabRef uses # for BibTeX strings.

mlep on 13 Oct 2020

Yep, @mlep is right. The # is a special character which indicates a bibtex string.
See https://docs.jabref.org/advanced/strings

I am thinking if it would be possible to escape this somehow for the comment field. Related problem in #7012

Siedlerchr on 13 Oct 2020

Maybe the internal handling of # takes place at a wrong thing. Currently, it is from the data model to writing and reading. Not from the data model to the presentation. The decision back then was that BibTeX data is presented as key/value and the value is a simple string (sequence of characters). When using BibTeX strings, the data gets a list of strings (character sequences) and BibTeX strings.

See the comment org.jabref.model.entry.BibEntry#toString:

    /**
     * This returns a canonical BibTeX serialization. Special characters such as "{" or "&" are NOT escaped, but written
     * as is. In case the JabRef "hack" for distinguishing "field = value" and "field = {value}" (in .bib files) is
     * used, it is output as "field = {#value#}", which may cause headaches in debugging. We nevertheless do it this way
     * to a) enable debugging the internal representation and b) save time at this method.
     * <p>
     * Serializes all fields, even the JabRef internal ones. Does NOT serialize "KEY_FIELD" as field, but as key.
     */
    @Override
    public String toString() {
        return CanonicalBibEntry.getCanonicalRepresentation(this);
    }

The handling of # during writing is desribed at org.jabref.logic.bibtex.FieldWriter#formatAndResolveStrings

    /**
     * This method handles # in the field content to get valid bibtex strings
     * <p>
     * For instance, <code>#jan# - #feb#</code> gets  <code>jan #{ - } # feb</code> (see @link{org.jabref.logic.bibtex.LatexFieldFormatterTests#makeHashEnclosedWordsRealStringsInMonthField()})
     */

I see following solutions

Solution 1: Add tests for the markdown case and try to fix org.jabref.logic.bibtex.FieldWriter#formatAndResolveStrings.
Solution 2: Quick solution is to replace the internal # in JabRef by a non-used characters such as §. Can be done seamlessly as the .bib file is not affected by this change; only our internal code and all documentation
Solution 3: Change the data type of a field value. From String to FieldValue. Meaning org.jabref.model.entry.BibEntry#setField(org.jabref.model.entry.field.Field, java.lang.String) changes to org.jabref.model.entry.BibEntry#setField(org.jabref.model.entry.field.Field, org.jabref.model.entry.field.Value) (org.jabref.model.entry.BibEntry#setField(org.jabref.model.entry.field.Field, java.lang.String)). This would bring JabRef's internal data model even more close to BibTeX

koppor on 14 Oct 2020

@Siedlerchr I am thinking if it would be possible to escape this somehow for the comment field. Related problem in #7012

I was just thinking, users may apply the markdown formatter to other fields (e.g. like abstract) to use the new markdown function. So just escaping the comment field may not be a real solution

teertinker on 17 Oct 2020

Quick recap:

left: BibTeX; right: Entry Editor

     field1 = value     -> #value#
     field2 = {value}   -> value

     field1 = value # value --> #value# # #value#

Solution 1

# test

## test

Just check for # at the beginning of a line followed by a space. If yes, do not do any replacement magic.

Regex: $#+

Good, because no documentation needs to be changed
Bad, because "quick handling" is introduced

Solution 2

 field1 = value     -> %value%
 field2 = {value}   -> value

 field1 = value # value --> %value% # %value%

Using % is bad, because % is a LaTeX command for comments. However, a user does not write LaTeX comments inside a BibTeX string
Using & is bad, because & is a LaTeX command for tables
Using $ is bad, because $ is a LaTeX command for math mode
Using § is bad, because % is not found on a US keyboard
Bad, because % has a different meaning in JabRef's entry editor than in BibTeX.

Solution 3

This is domain-driven design to the max ^^.

Decision

We opt for Solution 1, because does not introduce any changes in the documentation and usage of JabRef

koppor on 10 Nov 2020

Some thoughts regarding solution 1.

Should issue #7012 also be quick fixed?
Are linking to anchors allowed? (e.g., (a link)[#an-anchor] )
Is it an option to attempt to identify a String constant instead? There is already code that I believe is to this end (it was part of the initial commit so I can't find any associated issue/PR),
https://github.com/JabRef/jabref/blob/f5c52a2ef6aafa3536eb4e3f93974c8219c790f3/src/main/java/org/jabref/model/database/BibDatabase.java#L48
(?<!#)#\p{isL}+# might be an option, but at that point it might be better to not use regexp.

Nitpicking:

Regex: $#+

^ = beginning of a line

k3KAW8Pnf7mkmdSMPHz27 on 10 Nov 2020

Some thoughts regarding solution 1.
1. Should issue #7012 also be quick fixed?

Not though thourghly, but yes, on the one hand sounds good.

On the other hand, I am thinking whether % as character would be better. OK, maybe, there are also fiels with # out there

2. Are linking to anchors allowed? (e.g., `(a link)[#an-anchor]`  )

Damn it. Sure thing. Also URLs having fragments. #.

Disable bibtex field resolving completly in abstract? Maybe, this refs your solution proposed at https://github.com/JabRef/jabref/issues/7012#issuecomment-708007437.

In the devcall, we agreed that only power users use the string power of BibTeX strings (https://docs.jabref.org/advanced/strings). My take was that our UI in the entry editor currently is not that good to support it well (the improvment of the entry editor is Solution 3).

3. Is it an option to attempt to identify a String constant instead? There is already code that I believe is to this end (it was part of the initial commit so I can't find any associated issue/PR),
   https://github.com/JabRef/jabref/blob/f5c52a2ef6aafa3536eb4e3f93974c8219c790f3/src/main/java/org/jabref/model/database/BibDatabase.java#L48

   `(?<!#)#\p{isL}+#` might be an option, but at that point it might be better to not use regexp.

I don't fully understand the RegEx. Naivly, I would have searched for #[a-zA-Z1-9._-]+#, as only certain letters are allowed in BibTeX strings.

Nitpicking:

Regex: $#+

^ = beginning of a line

Ups 😇

koppor on 10 Nov 2020

I don't fully understand the RegEx. Naivly, I would have searched for #[a-zA-Z1-9._-]+#, as only certain letters are allowed in BibTeX strings.

Hum, yup, my regex was not thought through enough. I was going for Unicode support and thought that a # before a String would either be consumed or be "illegal". #[a-zA-Z1-9._-]+# is still better than .*#[^#]+#.*, if there is no Unicode support. I don't know enough about BibTeX/biblatex to find good edge cases.

I suppose my argument is that it might be easier to match the strings rather than the markdown, especially if the allowed characters are more limited than [^#].

However, a user does not write LaTeX comments inside a BibTeX string

Are you sure you want to jinx it? 😉
There is also url percent encoding.

Disable bibtex field resolving completly in abstract? Maybe, this refs your solution proposed at #7012 (comment).

Well, disable string resolution for most fields by default. But, I don't know enough about what goes on in the background of JabRef to even know if the settings below would make the comments save correctly.

Skärmavbild 2020-11-10 kl 19 48 35

Edit:
I tested the settings and they did not work as I'd expect.

k3KAW8Pnf7mkmdSMPHz27 on 11 Nov 2020

I think that kind of setting seems to be a viable solution
I have seen comments with latex code.

Siedlerchr on 11 Nov 2020

You might need to spearate the fields by semicolon like for the wrap fields

Siedlerchr on 11 Nov 2020

👀1 👍1

@Siedlerchr it appears to have been a user bug, it is now working as expected 😳

Add a third option? Along the lines of "Only resolve strings in..." with some defaults (author, title, date, etc.?), perhaps add categories?
I guess this could be considered "Solution 1.b", as it attempts to avoid the issue rather than resolving anything.

k3KAW8Pnf7mkmdSMPHz27 on 11 Nov 2020

Hello, is this issue still open?

If it is, I would like to work on it.