When auto-hyphenation is in use, I believe that in most languages - with German being the major exception - it would be preferable for browsers not to hyphenate capitalized words, which will often be proper nouns. In many cases authors and readers will prefer that names (of people, companies, etc) not be split, and in addition hyphenation rules designed for the "normal" words of a language may fail to hyphenate many names appropriately.
(https://bugzilla.mozilla.org/show_bug.cgi?id=1550532 was recently filed against Gecko about this issue.)
The CSS Text 3 spec explicitly does not specify exactly where hyphenation opportunities occur when hyphens:auto
is used. However, I would suggest adding an informative note to the spec, suggesting that browsers may want to suppress auto-hyphenation of capitalized words except when the hyphenation language in use is German.
For CSS Text 4, perhaps a property should be introduced to allow authors to explicitly control this behavior; e.g. hyphenation-capitalized-words: auto | yes | no
, where yes
and no
would have the obvious meaning, and auto
would tell the browser to use whatever heuristics it may have, such as considering the current language.
css
name, .name,
::proper-noun
{hyphens: none;}
if there is sufficient markup or if there was such a semantic pseudo element.
Is a new property really worth it? Is this something authors are asking to be able to control? Can the browser just do it right in the first place?
Well, what's "the right thing" for a browser to do regarding hyphenation of capitalized words? I don't think there's a clear answer to that, although I do think browsers should try for a sensible default behavior, and in https://bugzilla.mozilla.org/show_bug.cgi?id=1550532 we just made the suggested adjustment for Firefox.
The problem is that in some cases authors/users may prefer that proper names not be hyphenated (as requested in the Mozilla bug); we can't reliably identify proper names in general text, but we can use capitalized words as the best available proxy for this (except in German); but this has the drawback that we'll also suppress hyphenation of non-names at the beginning of sentences; in some cases, this trade-off may be too great and it'd be preferable to allow capitalized hyphenation after all. I don't think a single hard-coded behavior will ever satisfy all use cases.
(A further refinement to the heuristic -- not yet implemented -- would be to make the behavior dependent on line width, so that as line width is reduced, constraints on what may be hyphenated are relaxed.)
Note that systems such as TeX (the \uchyph
parameter) and InDesign (the "Hyphenate Capitalized Words" option in paragraph formatting) do expose this question to authors, recognizing that there is not a simple "correct" behavior that the application can universally use.
Obviously, authors can override the browser's heuristics by adding markup to individual names; the question here is what kind of default behavior, and how much author control, we can/should offer for (the overwhelming majority of) text that does not have that level of detailed markup.
I'm not sure that not hyphenating capitalized words in English is a rule and hyphenating them in German is an exception, and not the other way around. At least, AFAIK, in Russian there is no special case for capitalized words regarding hyphenation (only abbreviations are not hyphenated). Maybe a bit more statistics is needed?
I don't believe there are (in general) firm rules about this in either direction; it's a judgement call, and may depend on the specific content and the context in which it's being presented, as well as the individual preferences of the author/typographer.
As such, I think the best we can do in CSS is to offer some guidance as to good default behaviors for browsers -- and further information regarding typical usage in various languages may be helpful -- together with adequate controls so that authors can achieve the results they want.
WebKit just got a bug about this too (possibly filed by the same person) https://bugs.webkit.org/show_bug.cgi?id=197889
Note that systems such as TeX … and InDesign … do expose this question to authors.
This is a very good argument for adding a new property. Does anyone have other examples?
Hello, I’m the person filing these bugs. I appreciate the discussion. For the record, here's the bug I sent to Blink, too: https://bugs.chromium.org/p/chromium/issues/detail?id=963039&can=2&q=hyphen%20proper%20nouns
I do recognize that it will be difficult to find the perfect solution that works for everyone, but I think there can be more sensible defaults. People don’t like it when their name gets broken at the end of a line. Companies don’t like it when their own materials add hyphens into the middle of their brand names.
Hyphenation should be a progressive enhancement. Over the last 10+ years, I haven’t been able to use it in a professional setting, because I’m always asked to turn it off the instant someone sees their brand name or their own name broken across a line. That’s not an enhancement. I understand that we can turn it off on a case-by-case basis with .name
or something similar, but that puts the burden on content owners to wrap every name in a span. That’s not an enhancement either.
I wonder, too, if we could add a new value to the hyphens
property, all
, instead of having a whole separate property. auto
would be updated to hyphenate capitalized words based on language (e.g. in German, but not English) and all
would hyphenate capitalized words regardless of language. Or keep auto
as currently defined and add no-capitalized-words
as the new value.
I wonder, too, if we could add a new value to the hyphens property, all, instead of having a whole separate property.
Note that there are already multiple properties proposed for controlling hyphenation in CSS Text 4, and other open issues suggesting more control. So adding a single new keyword likely wouldn't be enough.
I'm happy to add a note to CSS Text saying that UAs might want to use heuristics suppress hyphenation in proper nouns, but I don't think we should define those heuristics in the spec.
("Capitalized words except in German" might want to be "Capitalized words except in German and except after periods", or in a CSS-to-PDF renderer used in publication workflows, even "Capitalized words except in German and except after periods unless we saw it capitalized not after a period." I don't think we'll come up with the ideal heuristics here.)
The last one, “Capitalized words, except in German, and except after periods, unless we saw it capitalized not after a period,” is the best heuristic I’ve seen so far, and the fact that it’s used in publication workflows backs that up.
“Capitalized” probably meaning _contains a capital letter_, not _begins with a capital letter_ to capture “iTunes” and the likes.
That's a good point, although in practice I wonder how many such names are actually long enough that hyphenation rules are likely to apply to them? Current browsers don't appear to find a hyphenation opportunity in "iTunes", for example, regardless of casing.
...when using English rules; however, I notice that with lang=de, we can hyphenate "iTu-nes". That's probably not ideal.
@revoltpuppy To be clear, that was a hypothetical example. :) Not very practical for browsers, but much more practical for publication workflows.
The CSS Working Group just discussed hyphens:auto should not hyphenate Capitalized words
, and agreed to the following:
RESOLVED: Add A note to the spec and close with no normative change
The full IRC log of that discussion
<Rossen_> Topic: hyphens:auto should not hyphenate Capitalized words
<Rossen_> github: https://github.com/w3c/csswg-drafts/issues/3927
<una> florian: so the issue being raised is that in some langs, when words are capitalized you should hyphenate and in some they should not
<una> ... we should bake this into the spec
<una> ... i'd like to close this as wontfix or rejected bc we already say this is dict based within the logic of the lang-based resource
<dauwhe> q+
<una> fantasai: I would go a little farther and say that we should only put a note and not change normative requirements and talk about proper nouns
<una> ... it can suggest i.e. in English you may want to supress hyphenation words that are proper nouns and mixed case
<una> ... I would like to leave the heuristics up to the user agent and not bake anything into the spec
<Rossen_> ack dauwhe
<una> dave: in english should capital letters be hyphenated? maybe... I wouldn't want anythign baked into the spec that says what should happen
<astearns> s/dave/dauwhe/
<una> AmeliaBR: the rec is more to add a suggested note to add in your hyphenation dictionaries you should consider this
<una> ... at least one browser has agreed
<una> ... not sure this is a normative requirement
<una> Rossen_: so proposed resolution for this is to add a note and no normative change
<una> RESOLVED: Add A note to the spec and close with no normative change
<una> florian: myles, a while back you raised 3566 - should we reopen?
...when using English rules; however, I notice that with lang=de, we can hyphenate "iTu-nes". That's probably not ideal.
That hyphenation somehow implies that Germans pronounce the word eye-too-ness instead of eye-toons or eye-tjoons. It seems "not ideal" for reasons other than capitalization; just as loan words generally aren't regular.
Most helpful comment
I'm not sure that not hyphenating capitalized words in English is a rule and hyphenating them in German is an exception, and not the other way around. At least, AFAIK, in Russian there is no special case for capitalized words regarding hyphenation (only abbreviations are not hyphenated). Maybe a bit more statistics is needed?