Sphinx: Text writer breaks on hyphens

Created on 26 Nov 2019  Â·  9Comments  Â·  Source: sphinx-doc/sphinx

The text writer uses a custom TextWrapper class that breaks hyphenated words. The parent class from Python standard library has a break_on_hyphens option that allows to disable that, however Sphinx’ TextWrapper has a custom wordsep_re regular expression so it is not easy to override that:
https://github.com/sphinx-doc/sphinx/blob/8c7faed6fcbc6b7d40f497698cb80fc10aee1ab3/sphinx/writers/text.py#L259-L263

This leads to two issues.

  1. Some technical terms, program names, email addresses, etc., are being cut at their hypen at the end of line. This makes copy&paste more difficult, and it reads confusingly.

  2. The second issue is related to footnotes. When processing a footnote reference, the add_text method is called with first argument being '[N]', where N is the footnote number.

    When first is not None, the text is wrapped:
    https://github.com/sphinx-doc/sphinx/blob/8c7faed6fcbc6b7d40f497698cb80fc10aee1ab3/sphinx/writers/text.py#L451

    then joined by space and wrapped again:
    https://github.com/sphinx-doc/sphinx/blob/8c7faed6fcbc6b7d40f497698cb80fc10aee1ab3/sphinx/writers/text.py#L456-L457

    Because of this, the words that had a hyphen get an extra space after the hyphen.

To Reproduce
Steps to reproduce the behavior:

$ cat >index.rst <<EOF
Please see the footnote. [#]_

.. [#]
   Another common way to do this is for ``build`` to depend on
   ``build-stamp`` and to do nothing else, and for the ``build-stamp``
   target to do the building and to ``touch build-stamp`` on completion.
EOF
$ touch conf.py
$ sphinx-build -b text . _build/text

The result is:

$ cat _build/text/index.txt 
Please see the footnote. [1]

[1] Another common way to do this is for "build" to depend on
    "build- stamp" and to do nothing else, and for the "build-stamp"
    target to do the building and to "touch build-stamp" on
    completion.

Expected behavior
In the second line of footnote there should be no space between build- and stamp.

I see two possible solutions to achieve that:

  • Do not break words on hyphens, or make that configurable.
  • Refactor the end_state method to avoid calling do_format() twice.

Environment info

  • OS: Debian GNU/Linux sid
  • Python version: 3.7.5
  • Sphinx version: latest master

This issue is based on two bugs in Debian: [#944330] and [#944331].

bug text

Most helpful comment

We use Sphinx to maintain Debian Policy, and would very much like to be able to turn off line breaks at hyphens. While I would argue that breaking technical terms at hyphens by itself is confusing, the more severe problem that we encounter is that file names, command names, and the like are broken across lines. Combined with indentation, this means that some commands cannot be cut and pasted, and it's generally just awkward. For example:

The required packages are called *build-essential*, and an
informational list can be found in "/usr/share/doc/build-
essential/list" (which is contained in the "build-essential" package).

or

If your package includes the scripts "config.sub" and "config.guess",
you should arrange for the versions provided by the package autotools-
dev be used instead (see autotools-dev documentation for details how
to achieve that).

or

When "dpkg-gencontrol" is run for a binary package, it adds an entry
to "debian/files" for the ".deb" file that will be created when "dpkg-
deb --build" is run for that binary package.

Could you reconsider?

Just as another data point, I'm the maintainer of the POD to text and *roff tools for Perl and have been for the past few decades, and one of the first things I did was disable hyphenation in *roff because the results varied between awkward and awful. Breaking at hyphens is somewhat dropping out of vogue in general, but particularly for technical work.

All 9 comments

Thank you for reporting. I hope #6869 fixes this case. Could you check it please?

Thanks for the fast fix! The second issue is fixed (so please merge it), but it would still be nice to make it possible to disable breaking words on hyphens.

Thank you for confirming. Just merged.

About first issue, IMO, it's natural if technical terms are folded at line end. So I don't have special opinion for changing current behavior.

Ah, sorry. GH automatically close this on merging the fix. reopened for discussion.

Any comments? As I commented above, I don't have motivation to improve this more at this moment.

It would still be nice to have an option to disable break on hyphens, but if you close it as wontfix I would accept that.

I might change my opinion if I see any ugly examples.

Now I'm closing this. Thanks,

We use Sphinx to maintain Debian Policy, and would very much like to be able to turn off line breaks at hyphens. While I would argue that breaking technical terms at hyphens by itself is confusing, the more severe problem that we encounter is that file names, command names, and the like are broken across lines. Combined with indentation, this means that some commands cannot be cut and pasted, and it's generally just awkward. For example:

The required packages are called *build-essential*, and an
informational list can be found in "/usr/share/doc/build-
essential/list" (which is contained in the "build-essential" package).

or

If your package includes the scripts "config.sub" and "config.guess",
you should arrange for the versions provided by the package autotools-
dev be used instead (see autotools-dev documentation for details how
to achieve that).

or

When "dpkg-gencontrol" is run for a binary package, it adds an entry
to "debian/files" for the ".deb" file that will be created when "dpkg-
deb --build" is run for that binary package.

Could you reconsider?

Just as another data point, I'm the maintainer of the POD to text and *roff tools for Perl and have been for the past few decades, and one of the first things I did was disable hyphenation in *roff because the results varied between awkward and awful. Breaking at hyphens is somewhat dropping out of vogue in general, but particularly for technical work.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Paebbels picture Paebbels  Â·  3Comments

ubershmekel picture ubershmekel  Â·  3Comments

jessetan picture jessetan  Â·  3Comments

shimizukawa picture shimizukawa  Â·  3Comments

ewjoachim picture ewjoachim  Â·  3Comments