Pandoc: Smart quotes don't work for multi-paragraph quotations

Created on 29 Aug 2020  ·  4Comments  ·  Source: jgm/pandoc

For quotes that span multiple paragraphs in English, it's conventional to put an opening quotation mark at the start of each paragraph, and only put a closing quotation mark at the end of the final paragraph.

Pandoc's smart extension will only convert the final opening and closing quotation marks.

A minimal reproducing example:

 echo "\"This is a quote. \n\n \"It spans paragraphs.\"" | pandoc --from=markdown+smart --ascii

--ascii isn't necessary, but it makes the difference in output more obvious.

Expected result

<p>&#x201C;This is a quote.</p>
<p>&#x201C;It spans paragraphs.&#x201D;</p>

Actual result

<p>"This is a quote.</p>
<p>&#x201C;It spans paragraphs.&#x201D;</p>

Version

pandoc 2.10.1
Compiled with pandoc-types 1.21, texmath 0.12.0.2, skylighting 0.8.5

Most helpful comment

For what it's worth, here is a small Lua filter to get the expected result.

function Para (p)
  local first = p.content[1]
  if first and first.t == 'Str' and first.text:sub(1, 1) == '"' then
    p.content[1] = pandoc.Str('“' .. first.text:sub(2))
    return p
  end
end

All 4 comments

The multi-paragraph thing is kind of a red hearing, you can replicate this in a single line with an unbalanced pairing. Single quotes get smart treatment without being matched, but double quote marks go all screwy if not matched.

In addition to the matching problem, there is also a weird edge case with spaces. Note how this (arguably wrong but technically valid) input gets changed to drop a space:

$ echo "It \"spans \"paragraphs." | pandoc --from=markdown+smart --ascii
<p>It &#x201C;spans&#x201D;paragraphs.</p>

For what it's worth, here is a small Lua filter to get the expected result.

function Para (p)
  local first = p.content[1]
  if first and first.t == 'Str' and first.text:sub(1, 1) == '"' then
    p.content[1] = pandoc.Str('“' .. first.text:sub(2))
    return p
  end
end

@tarleb Thanks! That script seems to work perfectly for my purposes.

It would still be nice if pandoc is able to support this out-of-the-box. Maybe an optional flag that causes all " and ' to be converted, regardless of if they have a corresponding opening/closing quotation in the same paragraph? Whether they turn into or could depend on if whitespace comes before or after them.

If it’s not feasible to add, perhaps the documentation should at least explain this limitation and its workarounds. I suspect that my situation isn’t a unique edge case.

The commonmark reader is smarter about this

% echo "\"This is a quote. \n\n \"It spans paragraphs.\""  | pandoc -f commonmark+smart --ascii
<p>&#x201C;This is a quote.</p>
<p>&#x201C;It spans paragraphs.&#x201D;</p>

Since we'll eventually be transitioning to that for the default markdown reader, I'm going to close this.

Was this page helpful?
0 / 5 - 0 ratings