Pandoc: Creating a shorter markdown-ish small-caps syntax in the new bracketed span syntax?

Created on 7 Mar 2016  Â·  21Comments  Â·  Source: jgm/pandoc

I am sure people have been brainstorming and discussing about this. And a quick search showed that almost a decade ago this has been discussed and a syntax like ^^small cap^^ has been suggested but probably it is not attractive enough for people to pick it. In contrast I'm totally new here so probably what I'm going to say don't even worth 0.02 cents. But please bear with me:

I think the more important thing than picking a syntax is the need to have one. Although it is difficult for people to agree on a syntax, I wish we can at least agree a need of a better syntax. In pandoc I think the small cap is the only exception to the rule that it does not look "marked down" but uses HTML syntax, although it understand it and will transform it according to the output format. Probably that makes it most urging to have a new, "marked down" syntax?

I also understand because of the CommonMark movement, probably it's not the best time to define new syntax. But I think at least we should leave it as an open issue pending to be solved sometime in the future when the dust settled?

As to the suggestion of syntax, here's one: !small cap!. Seems like it shouldn't has a conflict on existing syntax yet. But it's just my naive suggestion. I think more importantly is we agree we need a better syntax for that.

Most helpful comment

Should Small Caps Be Treated as Emphasis or Styling?

I understand the *s are different levels of emphasis. But from the above experience, I think we should discuss more on if small caps should be thought of emphasis or styling, because the philosophy behind using the small caps definitely influence the design choice of what syntax should be used.

If it is thought of as a styling, then

  • <span style="font-variant:small-caps;">Small caps</span>
  • [small caps]{.sc}

are better.

But if it is thought of some kind of emphasis, then

  • ^^small caps^^
  • !small caps!
  • ****small caps****

are better.

Establishing Small caps Primarily as Emphasis, not Styling

I argued that small caps is also meant to be an emphasis. i.e. it is more about the emphasis it made than the styling it involved.

HTML Should Not Be a Reference on Markdown (and pandoc) Syntax

Before I proceeded, I want to argue that how HTML does this currently is irrelevant. From the philosophy of pandoc:

Whereas Markdown was originally designed with HTML generation in mind, pandoc is designed for multiple output formats.

HTML may have many influences on Markdown and pandoc, but since pandoc is for multiple output formats, its syntax should be output format agnostic.

If this is true, then whether or not HTML treated small caps as emphasis (and its variants) or not is totally irrelevant.

I will even go as far as saying that the current limitation of technology should not dictates how pandoc/markdown syntax should be designed. e.g. real small cap fonts, etc. i.e. a markdown syntax designed "correctly" today might have a improved output in the future as technology advances.

Comparing Small Caps to Italic and Bold

Similar to when different font variants like italic, bold are used when we want to emphasize things, we need to ask why we are using a font variants like small caps? I am sure some people uses italic, bold and small caps for styling (I just worked with a prof. that he insisted on using all italic in the font page of a midterm exam. i.e. the normal, non-italic is the emphasis), but also for emphasis. So the question is why we think that when emphasis are used, italic and bold can be the styling to convey what we mean, but not small caps?

From everyday experience it seems that small caps means emphasis more than styling. As argued above, all caps is used when people are GETTING ANGRY and want to emphasize it. (It then related to the differences between all caps, small caps, and even petite caps. I personally think that all caps is just lazy small caps since it is the only way to do it in ASCII. But there could be legitimate use of all caps.)

A quick look into Small caps - Wikipedia, the free encyclopedia seems to suggest that small caps is used for emphasis. Sometimes First word in a paragraph is in small caps (primarily a styling). And sometimes titles are in small caps (really a styling that can be done in CSS and does not (and should not) require additional effort in markdown).

Italics, bold, small caps can all means both styling and emphasis. The next question to ask is for each of them, how stylish they are and how emphasizing they are? _Imagine if the whole page is italic (like the prof. mentioned above did), it could convey a style of being pretty_. Imagine if the whole page is bold, the author might (mistakenly) want to use that to show that it is more formal, or perhaps makes it more clear to be read. BUT IMAGINE IF THE WHOLE PAGE IS ALL/SMALL CAPS, WHAT DO YOU THINK THE AUTHOR IS DOING? IS IT HIS STYLE TO MAKE EVERYTHING SO DIFFICULT TO READ? OR MAKING IT SUPER FORMAL? NO, IT IS A SUPER STRONG SENSE OF EMPHASIS, MAKING SURE YOU KNOW THAT IT IS (PROBABLY) VERY IMPORTANT LEGAL MATTERS AND YOU HAVE TO UNDERSTAND EVERYTHING BEFORE YOU SIGN. The same exercise can be repeated for single word, does it make it more _emphasized_? Or making it stand out from the surrounding? Which one is STRONGER/STRONGEST?

Personally I think the level of certainty that a small caps really mean an emphasis is more that a italic/bold meaning an emphasis. So even in the case of italic and bold that we have a higher chance of using it for styling, and yet we do not have a markdown syntax to indicate a stylish italic/bold (rather than an emphasizing kind of italic/bold), in the case of small caps that is more probable to means emphasis than styling, we should have a markdown syntax for that (rather than HTML/HTML-ish). I understand the origin of the small caps syntax of pandoc is styling however (because of citation), and a stylish all caps (HTML) is already in the spec of pandoc. So I am really asking an emphasizing all caps.

After thinking it this way, I like ****small cap**** more than before: it is just a higher level of emphasis. Again, the old tricks still work because of how binary works (italic as "1st digit" emphasis, bold as "2nd digit" emphasis, small caps as "3rd digit" emphasis):

  1. *italic*
  2. **bold**
  3. ***bold and italic***
  4. ****small cap****
  5. *****small cap and italic*****
  6. ******small cap and bold******
  7. *******small cap and bold and italic*******

Summary and Questions

I am not expert in typography/programming/etc however so I could be wrong. So instead, I am asking all to discuss these questions:

  1. Should small caps be treated primarily as emphasis or styling?
  2. Should how HTML currently treated small cap (as styling) affects the design of pandoc/markdown syntax?

Summarizing my opinion on the above questions:

  1. Small caps is primarily an emphasis
  2. HTML/technological limitation/spec/usage should not dictates the design of pandoc/markdown in general

All 21 comments

Sorry, @ickc, but I don’t think it makes sense to have an inline element for small caps.

The span element with a small class should be enough (such as in [small caps]{.sc}).

The special syntax for divisions and spans is yet to be implemented (#168).

Hi, @ousia

Could you elaborate more on why?

And if there's a strong reason such an exception (comparing to all other existing syntax in pandoc markdown) should be made, I suggest to put it in the documentation. It is because I think every newbie reading through the documentation would have this question in there mind (I just finished reading the whole doc on pandoc so I am one of those newbies...).

And I think when we think small cap in the way of "span and class", it is very HTML. But to me small cap is much more similar to bold, italic, etc., in the way that a real font have a bold font for bold, italic font for italic, small cap font for small cap, etc. it is so happen because of this, it can be thought in terms of class and applied a font style with it.

My personal interest in pandoc is primarily LaTeX output with occasional HTML and docx use. In LaTeX a small cap would never been thought in terms of span and class so it is very unnatural. So since pandoc is output format agnostic (not sure if it is a good way to say it, but in the philosophy of pandoc it said something like the original markdown philosophy with one important difference that is multiple output format), it seems a very bad syntax if it is in HTML (current syntax) or indirectly HTML (the example you gave).

My 0.02 cents.

An alternative suggestion that will stay very true to the original markdown philosophy but possibly hard to implement is, when things are typed ALL CAP, then the parser would recognize that and applied appropriate output choice (e.g. HTML class and font, LaTeX command).

I guess it is this reason there was never a markdown syntax for small cap (comparing to bold and italic). It is almost natural to write. The only issue is typographical because it could be done better when it is not plain text: properly small cap font.

If done this way, it could be very good on backward compatibility—because in some old files, the "syntax" might already been there, so the new parser is improving the typographical output without the user doing anything (other than recompile).

Not to mix style and content.

TeX is awesome, but its document model (and specially LaTeX) is totally different from XML (and SGML).

pandoc development should avoid mimicking LaTeX. Markdown is based on HTML. XML as input format is more robust that LaTeX syntax.

I’m not saying that LaTeX is worse than XML. It is essentially something different: a document generation system (a typesetting system). XML is only a way of encoding information.

If you mainly use LaTeX, I guess the best choice is to have LaTeX sources.

BTW, italics or bold aren’t supported _as such_ in Markdown. The elements are emphasis and strong emphasis.

It is way more important to have in Markdown:

  • special syntax for divisions and spans,
  • special syntax for (natural) languages.

BTW, what is wrong with [small caps]{.sc}? (Sorry, I’m asking this, because I don’t get it.)

Hi, @ousia, thanks for replying so quick.

It is more about being simple and consistent than functionality. In the same line of thought, bold, italic, and small cap are (possible) variants in a font, but they are used when we want to emphasizing things in different ways and perhaps at different levels. I don't see how they should be treated so differently than each other. And I think in plain text, the no. 1 choice of emphasizing things is not asterisks, but ALL CAP. In the old days (and even now) in forums people ARE YELLING AT EACH OTHER LIKE THIS to show their emphasis. So to me all cap is more about what I mean than what I want to show (the style vs content you mentioned).

And about the latex source thing, to me it is better to use as much native pandoc as possible so that what I wrote is as output agnostic as possible (that can be output in different format in the future if needed/wanted).

And because of what you said, I brain stormed yet another syntax: ****small cap****. It is again natural because *, **, **** are different levels of emphasizing things, so they should be treated as similar as possible. And because of how binary works, the old trick still works:

  1. *italic*
  2. **bold**
  3. ***bold and italic***
  4. ****small cap****
  5. *****small cap and italic*****
  6. ******small cap and bold******
  7. *******small cap and bold and italic*******

It's worth keeping in mind that technically *text* is not italics, but emphasis. And ** is not bold, but strong. And ~~ is del, not strikethrough. All of these are typically rendered as noted, but don't have to be. IOW, they're _stricto sensu_ tagging content, not style. (Underline is an exception.)

Notes:

  1. The pandoc document model (Text.Pandoc.Definition in
    pandoc-types) already has a SmallCaps Inline constructor.
    This is supported by writers and used by pandoc-citeproc.
    What we lack is just a Markdown syntax to produce this.
  2. We could establish the convention that putting the "smallcaps"
    class on a span produces a SmallCaps element:

Hi there

This would then work in every output format that supports
smallcaps, not just in HTML.

  1. This would be less clunky if we implemented the native
    spans syntax:

[My Text in Small Caps]{.smallcaps}

  1. I'm sympathetic to the idea of having a native
    Markdownish syntax, but sort of on the fence.
  2. Converting ALL CAPS to small caps isn't a good idea,
    because sometimes authors intend ALL CAPS.
  3. You could use a filter that abuses link syntax, to
    convert

My Text

or something like that to a smallcaps "My Text". Such a
filter would be really easy to write.

Should Small Caps Be Treated as Emphasis or Styling?

I understand the *s are different levels of emphasis. But from the above experience, I think we should discuss more on if small caps should be thought of emphasis or styling, because the philosophy behind using the small caps definitely influence the design choice of what syntax should be used.

If it is thought of as a styling, then

  • <span style="font-variant:small-caps;">Small caps</span>
  • [small caps]{.sc}

are better.

But if it is thought of some kind of emphasis, then

  • ^^small caps^^
  • !small caps!
  • ****small caps****

are better.

Establishing Small caps Primarily as Emphasis, not Styling

I argued that small caps is also meant to be an emphasis. i.e. it is more about the emphasis it made than the styling it involved.

HTML Should Not Be a Reference on Markdown (and pandoc) Syntax

Before I proceeded, I want to argue that how HTML does this currently is irrelevant. From the philosophy of pandoc:

Whereas Markdown was originally designed with HTML generation in mind, pandoc is designed for multiple output formats.

HTML may have many influences on Markdown and pandoc, but since pandoc is for multiple output formats, its syntax should be output format agnostic.

If this is true, then whether or not HTML treated small caps as emphasis (and its variants) or not is totally irrelevant.

I will even go as far as saying that the current limitation of technology should not dictates how pandoc/markdown syntax should be designed. e.g. real small cap fonts, etc. i.e. a markdown syntax designed "correctly" today might have a improved output in the future as technology advances.

Comparing Small Caps to Italic and Bold

Similar to when different font variants like italic, bold are used when we want to emphasize things, we need to ask why we are using a font variants like small caps? I am sure some people uses italic, bold and small caps for styling (I just worked with a prof. that he insisted on using all italic in the font page of a midterm exam. i.e. the normal, non-italic is the emphasis), but also for emphasis. So the question is why we think that when emphasis are used, italic and bold can be the styling to convey what we mean, but not small caps?

From everyday experience it seems that small caps means emphasis more than styling. As argued above, all caps is used when people are GETTING ANGRY and want to emphasize it. (It then related to the differences between all caps, small caps, and even petite caps. I personally think that all caps is just lazy small caps since it is the only way to do it in ASCII. But there could be legitimate use of all caps.)

A quick look into Small caps - Wikipedia, the free encyclopedia seems to suggest that small caps is used for emphasis. Sometimes First word in a paragraph is in small caps (primarily a styling). And sometimes titles are in small caps (really a styling that can be done in CSS and does not (and should not) require additional effort in markdown).

Italics, bold, small caps can all means both styling and emphasis. The next question to ask is for each of them, how stylish they are and how emphasizing they are? _Imagine if the whole page is italic (like the prof. mentioned above did), it could convey a style of being pretty_. Imagine if the whole page is bold, the author might (mistakenly) want to use that to show that it is more formal, or perhaps makes it more clear to be read. BUT IMAGINE IF THE WHOLE PAGE IS ALL/SMALL CAPS, WHAT DO YOU THINK THE AUTHOR IS DOING? IS IT HIS STYLE TO MAKE EVERYTHING SO DIFFICULT TO READ? OR MAKING IT SUPER FORMAL? NO, IT IS A SUPER STRONG SENSE OF EMPHASIS, MAKING SURE YOU KNOW THAT IT IS (PROBABLY) VERY IMPORTANT LEGAL MATTERS AND YOU HAVE TO UNDERSTAND EVERYTHING BEFORE YOU SIGN. The same exercise can be repeated for single word, does it make it more _emphasized_? Or making it stand out from the surrounding? Which one is STRONGER/STRONGEST?

Personally I think the level of certainty that a small caps really mean an emphasis is more that a italic/bold meaning an emphasis. So even in the case of italic and bold that we have a higher chance of using it for styling, and yet we do not have a markdown syntax to indicate a stylish italic/bold (rather than an emphasizing kind of italic/bold), in the case of small caps that is more probable to means emphasis than styling, we should have a markdown syntax for that (rather than HTML/HTML-ish). I understand the origin of the small caps syntax of pandoc is styling however (because of citation), and a stylish all caps (HTML) is already in the spec of pandoc. So I am really asking an emphasizing all caps.

After thinking it this way, I like ****small cap**** more than before: it is just a higher level of emphasis. Again, the old tricks still work because of how binary works (italic as "1st digit" emphasis, bold as "2nd digit" emphasis, small caps as "3rd digit" emphasis):

  1. *italic*
  2. **bold**
  3. ***bold and italic***
  4. ****small cap****
  5. *****small cap and italic*****
  6. ******small cap and bold******
  7. *******small cap and bold and italic*******

Summary and Questions

I am not expert in typography/programming/etc however so I could be wrong. So instead, I am asking all to discuss these questions:

  1. Should small caps be treated primarily as emphasis or styling?
  2. Should how HTML currently treated small cap (as styling) affects the design of pandoc/markdown syntax?

Summarizing my opinion on the above questions:

  1. Small caps is primarily an emphasis
  2. HTML/technological limitation/spec/usage should not dictates the design of pandoc/markdown in general

Using Nested Emphasis as a Test on How Current Implementations of Markdown Treated *s

As we have discussed *, **, *** should be viewed as different levels of emphasis, not styling (although styling is used to represent the meaning). Then one natural way of testing the idea is emphasis within emphasis, or nested emphasis. For example, in LaTeX when emphasis within emphasis occurs, opposite font style is used to emphasize the inner emphasis. _Example is_ this emphasis _here_. Since the whole sentence is italic, the normal "this emphasis" becomes emphasized. Similarly, in Markdown one would expect *Example is *this emphasis* here* would be rendered as "_Example is_ this emphasis _here_".

I ran a test in Babelmark 2, and found out that pandoc does not behave this way. And to my surprise CommonMark does not "get it right" either. Because the issue involved is about markdown in general, so I posted it in the CommonMark Discourse. But I think it is worth mentioning here since this finding seems to show that we are not taking *s in Markdown as much as emphasis as we think.

Origin text posted in Testing nested emphasis - CommonMark Discussion:

Testing Nested Emphasis

It is similar to Nested emph and strong in Babelmark 2, but more exhaustive:

- 1 in 1: *some text *emphasized* again*

- 1 in 2: **some text *emphasized* again**

- 1 in 3: ***some text *emphasized* again***

- 2 in 1: *some text **emphasized** again*

- 2 in 2: **some text **emphasized** again**

- 2 in 3: ***some text **emphasized** again***

- 3 in 1: *some text ***emphasized*** again*

- 3 in 2: **some text ***emphasized*** again**

- 3 in 3: ***some text ***emphasized*** again***

Results

See Babelmark 2 - Compare markdown implementations

About the results:

  • IMO as long as there's some asterisks in the rendering, it is wrong.
  • Those that do not include any asterisk in the results can be categorized in:

    • commonmark 0.24.0 and others

    • It is interesting to see the lastest CommonMark treated emphasis almost as styling, in "1 in 1", "2 in 2" and "3 in 3", the inner emphasis hasn't been emphasized from the surround text.

    • RDiscount 1.6.8 and others

    • The 2 nesting levels are treated as addition first and if it exceeds 3, it becomes a difference between the 2.

    • cebe/markdown GFM 1.1.0 and others

    • "2 in 3" inner wordings are not emphasized.

So the only one "get it right" (treating emphasis between emphasis as another emphasis within the surrounding texts) is RDiscount 1.6.8 (and s9eTextFormatter (Fatdown/PHP)).

It seems no one is discussing this here. But to me CommonMark's interpretation seems wrong, and RDiscount (and the other one)'s are the correct one. Should this be added to Issues to resolve before 1.0 release?

I’d like to support @ickc’s proposal of introducing the ****small caps**** syntax. I, too, tend to see small caps as a form of emphasis, and I'll just note that https://en.wikipedia.org/wiki/Small_caps (“They are used in running text to prevent capitalized words from appearing too large on the page, and as a method of emphasis or distinctiveness for text alongside or instead of italics, or when boldface is inappropriate.”) and https://fr.wikipedia.org/wiki/Petite_capitale (“Elles peuvent être utilisées pour marquer l’emphase de mots …”) confirm this view.

Note: If one simply wants to make strong emphasis appear
as small caps instead of boldface, it is easy to achieve
this with a pandoc filter. (Just change Strong ->
SmallCaps.)

I’d like to support [1]@ickc’s proposal of introducing the *_small
caps
_* syntax. I, too, tend to see small caps as a form of emphasis,
and I'll just note that [2]https://en.wikipedia.org/wiki/Small_caps
(“They are used in running text to prevent capitalized words from
appearing too large on the page, and as a method of emphasis or
distinctiveness for text alongside or instead of italics, or when
boldface is inappropriate.”) and
[3]https://fr.wikipedia.org/wiki/Petite_capitale (“Elles peuvent être
utilisées pour marquer l’emphase de mots …”) confirm this view.

—
Reply to this email directly or [4]view it on GitHub.

References

  1. https://github.com/ickc
  2. https://en.wikipedia.org/wiki/Small_caps
  3. https://fr.wikipedia.org/wiki/Petite_capitale
  4. https://github.com/jgm/pandoc/issues/2761#issuecomment-193765770

@ickc and @nickbart1980,

sorry, but I’m afraid I strongly disagree.

Small caps (as well as italics or bold) are fonts. These font faces are styles or formats. If this is so, redefine emphasis (or strong emphasis) to use small caps instead of italics (or bold), as @jgm already suggested.

You may object that small caps constitute an element. Well, which one? Emphasis and strong emphasis are already taken. “Small caps emphasis” (third–level emphasis or similar names) would be elements only created because of the style.

Your proposal would be much more consistent with:

``````

_small caps_{.sc}

``````

Simply, a small caps class in the emphasis element. In fact, your proposal mainly redefines the font face used for emphasis.

HTML Should Not Be a Reference on Markdown (and pandoc) Syntax

@ickc, I think this may be a bit more complex than that. Having one or many outputs in pandoc is not relevant.

This is not about HTML markup, but about the XML document model. I’m not a expert, but its strongest feature is the following: elements with attributes. At least, three of them: identifier, classes and language.

Again, LaTeX wasn’t designed for text encondig in mind. It performs an excellent task in text typesetting, but this is totally different from text enconding.

The main problem in pandoc development is that LaTeX cannot handle XML natively. pandoc should translate between formats. But it would be a mistake to adapt pandoc in order to reflect the syntax of LaTeX. (Sorry, but LaTeX is the problem in text encoding, not the solution.)

BTW, having attributes in the emphasis element would be way more useful. This would allow to attribute language to foreign expressions, such as in:

``````

The concept of _Lebensweisheit_{:de}

``````

And it would also allow to distinguish between titles and real emphasis (using classes).

@jgm, please consider adding attributes to the emphasis element. These are essential for multilanguage documents.

Hi, @ousia,

Thanks for the thoughtful discussion.

First, I think that whenever we think it in the follow way,

Emphasis and strong emphasis are already taken.

we are in the box of HTML and technology. That box has narrowed our mind.

The reason people like Markdown it can already visually convey its meaning without being rendered. Another reason people like it is because when we write in Markdown, we do not need to think in terms of any styling at all, but the writing itself. (that's why even in Word processor, it is a good practice to write first and format later, not simultaneously.)

So my main focus is Markdown syntax, not HTML output, and not LaTeX (which is _very_ very VERY ugly in syntax). I regret I mention it in the beginning because although it is my primary targeted output format, it really has nothing to do with the Markdown (variant) syntax I am suggesting (and in fact LaTeX do not handle it as I suggested).

Since the main focus of the debate is about font styles and emphases, I believe the answer to the question of what syntax should be used lies in typography (not technology for example).

I am not an expert in typography, and while LaTeX might makes its users better at typography, it doesn't make us expert. Since I opened this issue, I have been researched into the related subjects in typography, including italics, bold, small caps, fake small caps, all caps, etc.

Typographically, there are italic, bold, small caps (and petite caps variants), all caps. They _are_ certainly styling, but why has the styles been created? There are many historical, technological (in terms of both limitation and liberation), etc. reasons of many typographical practices. I believe the primary reason they are created is because of emphasis, or standing out from the text. _But I have to emphasize that_ this is an open question here, and I believe it is the most important question to answer before deciding on a syntax for small caps.

Now _if_, a big IF, the font styles are primarily created for emphasis, then we need to ask, typographically, given the font styles what is the relative magnitude of emphasis do they convey. I believe in the order of increasing emphasis, it is this: italics, bold, small caps, all caps. Petite caps is ignored because it is a variant of small caps based on a different philosophy behind it. All caps will not be discussed because it can already be done in the TRIVIAL way, and because it is a bad practice except for initial like USA, or possibly title (typographically a bit controversial).

What's left is italics, bold, small caps. I guess Markdown and HTML has ignored the last one as an emphasis is because of historical technological limitation. Even nowadays it is very easy to encounter fake small caps rather than real small caps. So we then avoided using small caps for emphasis because typographically fake small caps is bad practice.

In other words, you might take the suggestion as a new level of emphasis. And since typographically the "only" correct emphasis left is small caps, we then suggest small caps as the style to represent such new level of emphasis (I am not talking about mixing different levels, as discussed above using the "binary" illustration). Put in this way, there are then 3 levels of emphasis (and can also be mixed and matched), the more natural, typographically correct choice is italic, bold and then small caps, but the end users might feel free to change them (boxed, colored, underline, all typographically bad practice. But if they want to, they can. And as @jgm has suggested, it can already be done).

Lastly, let me mention that I am not suggesting we have to pick one. In fact for backward compatibility's sake, I think the old syntax should be kept (which emphasize its styling aspect). The [small caps]{.sc} you mentioned which involved a broader definition could also be used, such that it is more uniform in that way of thinking. In the same line, similar syntax for italic and bold could also be defined, just as in LaTeX one might declare some texts is italic or emphasis, in some situation they have identical results but they really means different thing (most probably in HTML it can be used in at least 2 ways, one emphasizing it is an emphasis, one is just plain styling). What I am suggesting is a syntax for small caps, while other alternative syntaxes might occur, to emphasize its emphasizing aspect. A side benefit that 2 different spectrum of syntaxes for italic, bold, small caps as emphases, and as styling respectively, is that the behavior of "emphasis within another emphasis" and "font styles within another font styles" should becomes unambiguous (see the fragmentation on how it is rendered above).

In summary (the same 2 questions asked above)

  1. I still think the fundamental question to ask and answer is, when small caps is used, is it primarily an emphasis (standing out from the background at different intensity), or a styling (like people using small caps at the first word of a paragraph (and having much bigger font size))? I changed the title to reflect the importance of this question.

    • Related to this question: what makes small caps so different from italic and bold, which as a styling option, are used as (different levels of) emphasis? Is it because of technological limitation? Is it because of the use of lazy small caps (all caps)? Or else?

  2. Should technological limitation, HTML specification, etc. be a guideline to the Markdown syntax involved? Or should typographical practice be a guideline? And even if so, what does typography teach us?

I want to emphasize I am not expert on any of these. Again, these are open questions. But I think before we find answers for these questions, we can never settled for a syntax for small cap (or prematurely settled one?).

And I think I might keep silence on it in a while, not to be rude, but I think my points has been clarified in a lot of different ways and I don't want to dominate the discussion (especially when I am not an expert. "You're the doc, doc."). So feel free to discuss and pardon me if I don't response. Thanks everyone.

I have a filter which changes Strikeout to SmallCaps. It works quite well as I never use strikeout as strikeout.

Unfortunately, seems like people don't really see it is important to have a better syntax...

Some references on Typography

This is from a book I read before:

The discussion there emphasized that the font styles, italic and bold and small caps, etc. are typographically for emphasis.

HTML as a reference?

No body seem to discuss this. Should the limitation of HTML be our guides? I am not saying it shouldn't. But is this decision made already and why? And if so it better be reflected on the user guide as well. I can see there's many areas pandoc's syntax is heavily influenced by HTML. But the philosophy said

Whereas Markdown was originally designed with HTML generation in mind, pandoc is designed for multiple output formats.

Other thoughts

  • I see some has suggested to use filter to change some existing syntax of pandoc to represent small caps. And then someone really has done that. To me, this is the best example showing a better syntax is desperately needed. While I applause those "think outside the box" to do this such creatively/liberally, I think this shows the box has to be enlarged. Especially pandoc already has a native syntax to deal with small caps but it is just because it is too ugly.
  • Another trap pandoc has fallen into by the "HTML box" is that the current syntax is explicitly fake small caps. The [small caps]{.sc} kind of syntax is still explicitly fake small caps. Probably the LaTeX output is going to be real small caps (when the default lmodern is used). But the syntax itself is explicitly fake, both the current and future syntax in progress. I am not saying we should ignore the limit in technology. And in many instances "fake small caps" is the least troublesome or even the only way it can be done (given a certain font, or the limitation to interact with the font feature within the output format's capability). But I think it is very important to have a syntax that is not associated with "fake small caps", while fake small caps made eventually be used depending on the delivery format, it should always means real small caps.

I reflected this last point in the change of title, as I think it is the deepest fundamental problem of the current or in-progress syntax.

@ickc, sorry for not having replied before.

Unfortunately, seems like people don't really see it is important to have a better syntax...

I don’t think that having three elements related to emphasis is better (in terms of syntax) than having only one that has classes.

Should the limitation of HTML be our guides?

Correct me if I’m wrong, but I’m afraid you are mixing different issues.

Markdown is lightweight XML markup. Markup is related to the way you encode texts. With XML you have elements, such as <em>, <p> or <whatever>. These elements are granted attributes, such as in <em class="sc">.

Sorry, but why do you think that this is related to fake small caps? This depends on the properties and the values you set for the class in your cascading style sheet.

It would be fake small caps when you write:

em.sc {
    font-vari­ant: small-caps;
}

But you may have real small caps with (I haven’t tested myself):

em.sc {
    -moz-font-feature-settings: "smcp=1";
    -ms-font-feature-settings: "smcp";
    -webkit-font-feature-settings: "smcp";
    -o-font-feature-settings: "smcp";
    font-feature-settings: "smcp";
}

The only thing you need is an OpenType font that supports the feature and has the required glyphs.

XML is about encoding. It tells you and the computer _what is_ each part of the text, not how to handle it. For that, you need a cascading style sheet.

And this would be the same if you add an special element for small caps emphasis. The only gain that I would see in that is the mixture of content (what each element is) with format (how it should be displayed).

But, please, let me know whether I’m missing something from your explanation.

Thanks @ousia for your patience.

I was referring to <span style="font-variant:small-caps;">Small caps</span>, the current native pandoc syntax for fake small caps. But when I commented on [small caps]{.sc}, I was thinking/guessing it is just a synonym to the <span style="font-variant:small-caps;">Small caps</span> and then it would replace it with such styling in HTML.

But, even if this is true, the syntax [small caps]{.sc} itself is not explicit fake small caps. This is my mistake.

Regarding the "HTML as our guide" point, I'm saying in HTML, there's only em or strong that is normally rendered as i and b, but do not have, say, stronger that is normally rendered as small caps (and then it also relates the limitation on the technologies related to small caps because it is so rare to have real small caps fonts, etc. etc.). So within the box of HTML, we can then cannot have a small caps syntax corresponds to, say, **** (in the sense that * -> em, ** -> strong, **** -> ????). But in XML we do not have such limitation.

As mentioned in the original post, I googled and found discussions about the same issue almost a decade ago. Back then someone proposed ^^small cap^^ for small caps, and for some reasons (including people don't like this syntax) it is rejected. I was wishing that at least people would have a need of a better syntax for small caps and people can come up with a certain consensus. But for the few who cared about it, they probably had created their own filters (which to me it is so wrong since it is not solving the problem but abusing other syntax. A "curse" of the power of pandoc that people can do whatever they want...). And then there's a proposed syntax for div, so that we have a hope to settle it for a generic syntax while a bit more elegant then the current pandoc small caps syntax.

I think small caps is really tragical: it is always a last class citizen in the digital age. While by "birth right" it should be at the same status of italic or bold (and in a certain sense superior to them: which one means more emphasis?), most of the time it hasn't even born (many fonts do not have small caps), or being imitated (fake small caps). And because of the near non-existance of it, most people don't use it at all.

(Real) Small caps is important for another reason. Typographically, italic is discouraged in sans serif because it doesn't really _emphasize_ (it usually is really slanted a bit for sans serif and doesn't stand out). Bold might be discouraged by some typographer too because (by definition) has the wrong weight comparing to the surrounding text and even if used should be used sparingly because it hurts the eye. (Real) Small caps is the only one without the 2 problems. In this sense small caps again is superior to italic and bold in terms of emphasis. (it doesn't hurt readability but it is almost always wrong if an emphasis emphasized too many things.)

With the new bracketed span, it can allow small caps in bracketed span as in pull request #3191: [Small caps]{style="font-variant:small-caps"}.

I don't know if it would be a good time to re-discuss a possible markdown-ish, shorter syntax for small caps, as @ousia initially suggested using the bracketed span syntax.

@ickc, as commented in the pull request itself, using a class seems to be much smarter _for handling HTML output itself_.

Right, let's actually not reopen this issue but focusing the discussion in #1592.

It is _very_ easy to write a filter which picks up a class on a span
element and wraps the span in the right LaTeX raw markup for example. As I
mentioned on pandoc-discuss a while ago I have written a general filter
which lets you specify arbitrary raw markup to insert before or after
elements, on a per element-type+class+output-format basis in the document
metadata or in a YAML file passed along with the input documents. That way
a non-programmer (or a programmer who doesn't like to write the same code
over and over :-) and/or who likes to keep their modifications along with
the document) can do quite varied and powerful modifications. I only need
to finish writing the documentation as and when I have time. As you may
know the ratio of time it takes write code vs. documentation obeys Zipf's
law! :-)

Den 28 okt 2016 00:32 skrev "ickc" [email protected]:

Right, let's actually not reopen this issue but focusing the discussion in

1592 https://github.com/jgm/pandoc/issues/1592.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/jgm/pandoc/issues/2761#issuecomment-256788570, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABG3U4OdH23UCHghOowOEEOQebGRAKsgks5q4SZjgaJpZM4Hqv_d
.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stepht picture stepht  Â·  54Comments

jgm picture jgm  Â·  51Comments

GeraldLoeffler picture GeraldLoeffler  Â·  143Comments

GiantCrocodile picture GiantCrocodile  Â·  54Comments

uvtc picture uvtc  Â·  47Comments