Vimtex: Format enumerate environment with gqip exceeds specified textwidths by two chars

Created on 8 Sep 2016  ·  20Comments  ·  Source: lervag/vimtex

\begin{enumerate}[(i)]
  \item xxxxxxxxxx xx xxxxxx xxxxxx xx xxxxxxxxxx xxxxxxx xxxxxxx xxx
    xxxxxxxxxx xxxxxxxxxx xxxxxxx xxxxxxxxxxxxxxx xxxx xxxxx xxxxxxxxxxx xxxxxx
    xxxxxxxxxxxx xxxx xxxxxxx xxxxxxxxx xxxxxxxxx xxxxxxxxxxx xxxxxxxxx xxxxxxxx
    xx xxxxxxxxx
  \item xxxxxx xxxxxxxxxxxxx xxxxxxxxxx xxxxxxxxx xxxxxx xxxxxxxxxxxxxx xxx
    xxxxxxxxxxxxxx xxxxxxxxx xxxxxxxxxx xx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxxx
    xxxxxxxxxxxxxxxxx xxxxx xxxxxxx
\end{enumerate}

% vim: ft=tex tw=78 sw=2 cc=+1

demo-format

I have formatted the environment with gqip. I was expecting that the line lengths would be at most 78 chars.

All 20 comments

I have encountered a second issue with the formatting command. Probably this can be fixed together with this issue:

Following cannot be reformatted at all:

\begin{enumerate}[(i)]
  \item xxxxxxxxxx xx xxxxxx xxxxxx xx xxxxxxxxxx xxxxxxx xxxxxxx xxx xxxxxxxxxx xxxxxxxxxx xxxxxxx xxxxxxxxxxxxxxx xxxx xxxxx xxxxxxxxxxx xxxxxx xxxxxxxxxxxx xxxx xxxxxxx xxxxxxxxx xxxxxxxxx xxxxxxxxxxx xxxxxxxxx xxxxxxxx xx xxxxxxxxx
  \item xxxxxx xxxxxxxxxxxxx xxxxxxxxxx xxxxxxxxx xxxxxx xxxxxxxxxxxxxx xxx xxxxxxxxxxxxxx xxxxxxxxx xxxxxxxxxx xx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxxxxxxxx xxxxx xxxxxxx
\end{enumerate}

% vim: ft=tex tw=78 sw=2 cc=+1

Nothing happens when I press gqip.

I could identify responsible code sections and make modifications to resolve the problems. However, they might have side effects. I also could not run vader tests on OSX (only 8/32 without any changes were successful).

diff --git a/autoload/vimtex/format.vim b/autoload/vimtex/format.vim
index a96a983..2fb8d27 100644
--- a/autoload/vimtex/format.vim
+++ b/autoload/vimtex/format.vim
@@ -118,7 +119,7 @@ function! s:format(top, bottom) " {{{1
     endif

     if l:line =~# s:border_beginning
-      if l:current < l:mark
+      if l:current < l:mark || len(l:line) > s:textwidth
         let l:bottom += s:format_build_lines(l:current, l:mark)
       endif
       let l:mark = l:current-1
@@ -158,7 +159,7 @@ function! s:format_build_lines(start, end) " {{{1
     if len(l:word) + len(l:current) > s:textwidth
       call append(l:lnum, substitute(l:current, '\s$', '', ''))
       let l:lnum += 1
-      let l:current = repeat(' ', VimtexIndent(a:start))
+      let l:current = repeat(' ', VimtexIndent(l:lnum))
     endif
     let l:current .= l:word . ' '
   endfor

The first hunk fixes my second comment.
The second hunk fixes the problem of this issue not entirely: still one character too long:

demo-format-fix1and2

I want to add a note about other possibilities of formatting latex code

print.line.length This numeric resource specifies the desired width of the lines. lines
which turn out to be longer are tried to split at spaces and continued in the next
line. The value defaults to 77.
print.indent This numeric resource specifies indentation of normal items, i.e. items in
entries which are not strings or comments. The value defaults to 2.
print.align This numeric resource specifies the column at which the ’=’ in non-comment
and non-string entries are aligned. This value defaults to 18.

I've pushed code that I think should fix this issue. Thanks for the suggestions, they helped to solve this quite quickly. Note, though, that I don't think your second change was needed at all.

Thanks for the tips. Had I known about latexindent.pl earlier, I might have tried to rely on it instead. However, I think both the formatting and indentation is getting pretty good now.

Thanks for your update.

However, there is still one line longer than it should be. One can check line lengths with

:!awk '{print length}' %
22
69
79 <---
71
25
75
76
35
15
0
30 

I am quite happy that formatting and indentation works out of the box even though latexindent.pl might have more functionality. For instance, latexindent.pl requires additional perl modules which are not installed by default on macOS.


For quite a while I was not sure how I should format ordinary text blocks. Right now I follow the idea of http://dustycloud.org/blog/vcs-friendly-patchable-document-line-wrapping/:

If you do enough work in any sort of free software environment, you get used
  to doing lots of writing of documentation or all sorts of other things in
  some plaintext system which exports to some non-plaintext system.
One way or another you have to decide: are you going to wrap your lines with
  newlines?
And of course the answer should be "yes" because lines that trail all the way
  off the edge of your terminal is a sin against the plaintext gods, who are
  deceptively mighty, and whose wrath is to be feared (and blessings to be
  embraced).
So okay, of course one line per paragraph is off the table.
So what do you do?

If you like this idea, I would submit a separate issue for a feature request. Implementation won't be too easy, in particular, correct sentence detection is quite tricky. I guess people have different opinions about this so this style should only be enabled if the user wish so.

It works well for me. I noticed that if I changed the textwidth during editing, then that change didn't reflect within vimtex. I just fixed that. But other than that, your example now formats as expected for me.

How to format text blocks is a difficult question. There are many ideas and methods. Personally, I really like to keep things within 79 columns, but for collaboration with LaTeX I've come to realize that it is often convenient, e.g. for diffing, when one uses one sentence per line. The suggested method from your link also seems neat in this regard, but I think a problem is to have friends/collaborators use the same format.

With the current implementation, we can handle the "standard" text flow with indentation and a given textwidth, and it is easy to also use long lines and no hardwrapping, if that is preferred. I don't think it is necessary to add more methods.

Your commit e3028cb9cdba9e09dd267e2a76211807a2a9fd8e fixed my remaining issue. As you said when textwidth is changed during editing and apparently via a modeline, then that change didn't reflect within vimtex.

Now no line is too long anymore.

I agree with your comment about collaboration and the difficulty to keep a consistent format of latex source code. Even though the general rule is to stick with the existing format some people simply do not take the effort to adapt to it.
People use soft wrapping more often than they should. Afaik, it is actually quite common to soft wrap latex code and place each sentence on a separate line.

Great, good to hear that the issue is resolved.

I guess people are just different. Some people, like us, care about details such as how a file is formatted, others only care about the end result. I guess this also holds for things like the use of microtype and similar. In any case, I hope you can agree with my current decision to not support several methods of formatting. It would lead to much more code that must be maintained, and it is already quite a lot of work for me to keep on top of the issues... :)

Yes, I can agree with your decision :smile:. IMHO it is quite demanding to get it right and keeping the domain/scope/feature set of vim plugins more narrow is always better.

I think the automatic formatting as suggested by dustycloud should be realized with a dedicated tool. Possibly, the author of latexindent.pl takes the effort to add it 😇 .

Thanks! And agreed; this seems like a feature for a dedicated tool like latexindent.pl. If this tool becomes very robust, portable and well known, then I might even consider to simply vimtex by using that tool.

Apache OpenNLP (Java)

If someone is interested in a possibility to format text according to http://dustycloud.org/blog/vcs-friendly-patchable-document-line-wrapping/ using Apache OpenNLP (machine learning based toolkit written in Java for the processing of natural language text) and par, I add a one-liner for the command line:

$ cat dusty.txt
If you do enough work in any sort of free software environment, you get used to
doing lots of writing of documentation or all sorts of other things in some
plaintext system which exports to some non-plaintext system.  One way or
another you have to decide: are you going to wrap your lines with newlines?
And of course the answer should be "yes" because lines that trail all the way
off the edge of your terminal is a sin against the plaintext gods, who are
deceptively mighty, and whose wrath is to be feared (and blessings to be
embraced).  So okay, of course one line per paragraph is off the table.  So
what do you do?
$ tr '\n' ' ' < dusty.txt | opennlp SentenceDetector en-sent.bin 2>/dev/null | par 78p2dh
If you do enough work in any sort of free software environment, you get used
  to doing lots of writing of documentation or all sorts of other things in
  some plaintext system which exports to some non-plaintext system.
One way or another you have to decide: are you going to wrap your lines with
  newlines?
And of course the answer should be "yes" because lines that trail all the way
  off the edge of your terminal is a sin against the plaintext gods, who are
  deceptively mighty, and whose wrath is to be feared (and blessings to be
  embraced).
So okay, of course one line per paragraph is off the table.
So what do you do?

Download en-sent.bin
There are also de-sent.bin, da-sent.bin, nl-sent.bin, pt-sent.bin, and se-sent.bin.

Natural Language Toolkit (NLTK)

A Python toolkit is nltk which can also achieve this:

import nltk
from nltk.tokenize import sent_tokenize
nltk.download()
with open('dusty.txt', 'r') as myfile:
  data=myfile.read().replace('\n', ' ')
sent_tokenize_list = sent_tokenize(data)
print(sent_tokenize_list)
['If you do enough work in any sort of free software environment, you
get used to doing lots of writing of documentation or all sorts of other
things in some plaintext system which exports to some non-plaintext
system.', 'One way or another you have to decide: are you going to wrap
your lines with newlines?', 'And of course the answer should be yes
because lines that trail all the way off the edge of your terminal is a
sin against the plaintext gods, who are deceptively mighty, and whose
wrath is to be feared (and blessings to be embraced).', 'So okay, of
course one line per paragraph is off the table.', 'So what do you do?']

I could not figure out which languages are supported.

There is also the Stanford CoreNLP with support for Arabic, Chinese, English, French, German, and Spanish which I haven't tried yet.

Thanks, this looks interesting.

Vim Configuration with NLTK

And finally a possible vim setup using nltk:

  1. Install par, nltk and necessary nltk-data with
    sh $ brew install par # sudo apt install par $ pip3 install nltk $ python3 -m nltk.downloader punkt # $ python3 -m nltk.downloader -d CUSTOMPATH punkt
  1. Install following script nltk_sent_tokenize into your $PATH and make it exectuble with chmod +x nltk_sent_tokenize

    #!/usr/bin/env python3
    import sys
    from nltk.tokenize import sent_tokenize
    #nltk.data.path.append('CUSTOMPATH')
    data = sys.stdin.read()
    sent_tokenize_list = sent_tokenize(data.replace('\n', ' '))
    for line in sent_tokenize_list:
        print(line)
    
  2. Configure vim with

    set formatprg=nltk_sent_tokenize\|par\ 78p2dh
    " let &l:formatprg='tr "\\n" " "|opennlp SentenceDetector en-sent.bin 2>/dev/null |par 78p2dh'
    

    e.g. in ~/.vim/after/ftplugin/tex.vim:

    nnoremap <buffer> g= :%!tr '\n' ' '<Bar>opennlp SentenceDetector en-sent.bin 2>/dev/null<Bar>par w<C-r>=&tw<CR>p<C-r>=&sw<CR>dhs0e<CR>
    xnoremap <buffer> g= :!tr '\n' ' '<Bar>opennlp SentenceDetector en-sent.bin 2>/dev/null<Bar>par w<C-r>=&tw<CR>p<C-r>=&sw<CR>dhs0e<CR>
    

    (gw builtin formatting, = indentexpr by vimtex, gq formatexpr vimtex)

  3. Test setup

    $ cat dusty.txt
    If you do enough work in any sort of free software environment, you get used to
    doing lots of writing of documentation or all sorts of other things in some
    plaintext system which exports to some non-plaintext system.  One way or
    another you have to decide: are you going to wrap your lines with newlines?
    And of course the answer should be "yes" because lines that trail all the way
    off the edge of your terminal is a sin against the plaintext gods, who are
    deceptively mighty, and whose wrath is to be feared (and blessings to be
    embraced).  So okay, of course one line per paragraph is off the table.  So
    what do you do?
    $ vim dusty.txt
    gqip:w
    $ cat dusty.txt
    If you do enough work in any sort of free software environment, you get used
      to doing lots of writing of documentation or all sorts of other things in
      some plaintext system which exports to some non-plaintext system.
    One way or another you have to decide: are you going to wrap your lines with
      newlines?
    And of course the answer should be "yes" because lines that trail all the way
      off the edge of your terminal is a sin against the plaintext gods, who are
      deceptively mighty, and whose wrath is to be feared (and blessings to be
      embraced).
    So okay, of course one line per paragraph is off the table.
    So what do you do?
    

Again, thanks. If I adopt this style your suggestions will be useful. :)

Semantic linefeeds

Just for the protocol: another formatting style is semantic linefeeds described under http://rhodesmill.org/brandon/2012/one-sentence-per-line/

Example:

 ...
 the definition in place of it.

-The beauteous scheme is that now,
+The beauty of this scheme is that now,
 if you change your mind
 about what a paragraph should look like,
 you can change the formatted output
 merely by changing
 the definition of ‘‘.PP’’
 and re-running the formatter.

 As a rule of thumb, for all but the most
 ...

Latexindent 3.5 (August 2018)

Also for the record: latexindent.pl (ctan) supports the dustycloud style with version 3.5 released on 2018-08-14.

Feature discussion can be found here https://github.com/cmhughes/latexindent.pl/issues/111 and released with https://github.com/cmhughes/latexindent.pl/pull/127

New section in the documentation is 6.2.5 text wrapping and indenting sentences with suggested settings in Listing 269 sentence-wrap1.yaml:

modifyLineBreaks:
    oneSentencePerLine:
        manipulateSentences: 1
        removeSentenceLineBreaks: 1
        textWrapSentences: 1
        sentenceIndent: "  "
    textWrapOptions:
        columns: 50

The helpfile can be opened locally with $ texdoc latexindent or online under http://mirrors.ctan.org/support/latexindent/documentation/latexindent.pdf.

Here applied explicitly on the dusty cloud example using the configuration given in the feature discussion (using custom regex to detect end of sentences).

❯ cat dusty.tex
If you do enough work in any sort of free software environment, you get used to
doing lots of writing of documentation or all sorts of other things in some
plaintext system which exports to some non-plaintext system.  One way or
another you have to decide: are you going to wrap your lines with newlines?
And of course the answer should be "yes" because lines that trail all the way
off the edge of your terminal is a sin against the plaintext gods, who are
deceptively mighty, and whose wrath is to be feared (and blessings to be
embraced).  So okay, of course one line per paragraph is off the table.  So
what do you do?

❯ cat localSettings.yaml
modifyLineBreaks:
    oneSentencePerLine:
        manipulateSentences: 1
        removeSentenceLineBreaks: 1
        textWrapSentences: 1
        sentenceIndent: "  "
        sentencesEndWith:
            betterFullStop: 0
            other: '(?:\.\)(?!\h*[a-z]))|(?:(?<!(?:(?:e\.g)|(?:i\.e)|(?:etc))))\.(?!(?:[a-z]|[A-Z]|\]|\-|\,|[0-9]))'
    textWrapOptions:
        columns: 78

❯ latexindent dusty.tex -m -l=localSettings.yaml -w
If you do enough work in any sort of free software environment, you get used
  to doing lots of writing of documentation or all sorts of other things in
  some plaintext system which exports to some non-plaintext system.
One way or another you have to decide: are you going to wrap your lines with
  newlines?
And of course the answer should be "yes" because lines that trail all the way
  off the edge of your terminal is a sin against the plaintext gods, who are
  deceptively mighty, and whose wrath is to be feared (and blessings to be
  embraced).
So okay, of course one line per paragraph is off the table.
So what do you do?

I have not figured out yet how the detection of sentences works by default or bisected the regex from above.

There are several options for the yaml (see help section 6.2)

sentencesBeginWith
By default, latexindent.pl will only assume that sentences begin with the upper case letters A-Z; you can instruct the script to define sentences to begin with lower case letters (see Listing 240), and we can use the other field to define sentences to begin with other characters.

sentencesEndWith
There is a field called basicFullStop, which is set to 0, and that the betterFullStop is set to 1 by default.

betterFullStop
Full stops at the end of sentences have the following properties:
• they are ignored within e.g. and i.e.;
• they can not be immediately followed by a lower case or upper case letter;
• they can not be immediately followed by a hyphen, comma, or number.
If you find that the betterFullStop does not work for your purposes, then you can switch it off by setting it to 0, and you can experiment with the other field.

The basicFullStop routine should probably be avoided in most situations, as it does not accomo- date the specifications above.

Thanks, these are interesting tools. It would be convenient to be able to interface with this from vimtex.

I would be happy if you could open a new issue for this. Right now, it is somewhat unclear what should/could be done in this direction, and it is also unclear if you are asking for some feature improvement or not.

Hi Lervag,

I agree that this should be moved to a separate issue. I added the comments to have everything at a single place.

Maybe the new issue could be

Support external tools for formatting code

Format tex source code with
- internal method (implemented here: https://github.com/lervag/vimtex/blob/master/autoload/vimtex/format.vim)
- latexindent.pl

In the spirit of

LaTeX log parsing for quickfix entries using
- internal method
- pplatex

However, there are already dedicated vim plugins for formatting using external tools

And afaik, you prefer to delegate features, if possible, to other plugins (which I would do so as well).

We could still add documentation describing practices of formatting and how to achieve it conveniently in vim.

What is your opinion?

Do you know advantages which vimtex could provide compared to more generic formatting plugins such as vim-autoformat and neoformat?

I think all of the text after "Support external tools for formatting code" looks like a very good text for a new issue, including the questions asked at the end, and I think it is better to answer them in the new issue. Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Davidnet picture Davidnet  ·  4Comments

adimanea picture adimanea  ·  5Comments

chakravala picture chakravala  ·  5Comments

itsShnik picture itsShnik  ·  5Comments

thomasahle picture thomasahle  ·  4Comments