I am currently using pandoc ans ist smart ponctuation to generate an epub output, this works pretty well.
In my source I have:
Du "texte" en français!
The output is:
Du âtexteâ en français!
But since the text is in French, I would like to use the French typography rules to get something like:
Du « texte » en français !
(please note the nonbreaking spaces).
So is there a (easy) way to define some typography rules for an output? or this should be an enhancement?
Tanks a lot.
You could try using the --html-q-tags option. Then use
CSS to style the q tags appropriately.
If that doesn't work, then your options are:
+++ Sébastien Gross [Jan 05 16 10:54 ]:
I am currently using pandoc ans ist smart ponctuation to generate an
epub output, this works pretty well.In my source I have:
Du "texte" en français!The output is:
Du âtexteâ en français!But since the text is in French, I would like to use the French
typography rules to get something like:
Du « texte » en français !(please note the nonbreaking spaces).
So is there a (easy) way to define some typography rules for an output?
or this should be an enhancement?Tanks a lot.
â
Reply to this email directly or [1]view it on GitHub.References
I am experiencing a similar issue. At first, I would have expected --smart to handle typography for ponctuation as well, but it does not seem to do so.
First problem with --smart and writing text in French (and maybe some other languages) is that French language does not use curly quotes but French quotes « ». In some keyboard layouts (thinking in fr oss), they are easily reachable, but that is not the case on every keyboard layout (especially in Windows) and being able to automatically replace " " by « » could be very helpful. This could obviously be done using a post-processing script (or a Pandoc filter) but what about including a --french-quotes option in Pandoc to do it?
Second problem is that typography, and especially the position (and nature) of whitespaces differ a lot from one language to another. In particular, in French (contrary to English), there should be a non-breaking space before any double punctuation sign (!, ?, :, ;). Similar rules exists for the spaces enclosing quotes (should be SPACE « NON_BREAKING_SPACE TEXT NON_BREAKING_SPACE » SPACE if I remember correctly) and so on.
In particular, non breaking space are almost impossible to type easily (without special tweak of the keyboard layout). I think it would be awesome if Pandoc could handle it.
What do you think?
+++ Lucas Verney [Apr 14 16 15:10 ]:
I am experiencing a similar issue. At first, I would have expected
--smart to handle typography for ponctuation as well, but it does not
seem to do so.First problem with --smart and writing text in French (and maybe some
other languages) is that French language does not use curly quotes but
French quotes « ». In some keyboard layouts (thinking in fr oss), they
are easily reachable, but that is not the case on every keyboard layout
(especially in Windows) and being able to automatically replace " " by
« » could be very helpful. This could obviously be done using a
post-processing script (or a Pandoc filter) but what about including a
--french-quotes option in Pandoc to do it?
See #84. I'd actually never thought that a French writer
would want to type " for quotes, and have them render with
French quotes. But if that is the case, it wouldn't be all
that hard to provide some kind of configurable option.
Another option would be localization, so that the quote
style is affected by the lang metadata field. Though I
gather many languages don't have one standard quoting style.
Third option would be localization + an override.
Concerning the quotes, I may have an unusual approach, but indeed, " seems to me to be the widely available quote character, and most easily typable. So being able to use it to be automatically replaced to «/« would be awesome, in my opinion. Still, there should be a way to prevent automatic conversion (like escaping) to be able to type " in a French text as well (but the same problem stands for English typography).
:+1: for localization-based, using the lang metadata field. Or an override option. The advantage of the localization-based method is that it also permits to tweak non-breaking spaces depending on the language.
Having a --french-quote is not a good idea since this is a very dedicated task. Having a --lang option is a better idea if you can extend a language map. Latex uses babel for that task.
Related issue #661
I'm stoked AF at the idea of being able to set it up so I can get »Danish style quotes« from --smart with a babel-like solutionâ„
Hi!
I've been reading this issue and #84 as well as the documentation but I haven't really understood how this should work, and if it's implemented for my use case.
I write text in markdown that I convert to ICML to use in InDesign documents. When I write Swedish text I want quotes to be identical "".
Here is my input and outputs:
sh-4.4$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
sh-4.4$ pandoc -v
pandoc 2.5
Compiled with pandoc-types 1.17.5.4, texmath 0.11.1.2, skylighting 0.7.5
Default user data directory: /home/tetov/.pandoc
Copyright (C) 2006-2018 John MacFarlane
Web: http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
sh-4.4$ cat test.md
---
lang: sv
---
"Test" ... --
sh-4.4$ pandoc -s -w icml -o test.icml test.md
sh-4.4$ cat test.icml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?aid style="50" type="snippet" readerVersion="6.0" featureSet="513" product="8.0(370)" ?>
<?aid SnippetType="InCopyInterchange"?>
<Document DOMVersion="8.0" Self="pandoc_doc">
<RootCharacterStyleGroup Self="pandoc_character_styles">
<CharacterStyle Self="$ID/NormalCharacterStyle" Name="Default" />
</RootCharacterStyleGroup>
<RootParagraphStyleGroup Self="pandoc_paragraph_styles">
<ParagraphStyle Self="$ID/NormalParagraphStyle" Name="$ID/NormalParagraphStyle"
SpaceBefore="6" SpaceAfter="6"> <!-- paragraph spacing -->
<Properties>
<TabList type="list">
<ListItem type="record">
<Alignment type="enumeration">LeftAlign</Alignment>
<AlignmentCharacter type="string">.</AlignmentCharacter>
<Leader type="string"></Leader>
<Position type="unit">10</Position> <!-- first tab stop -->
</ListItem>
</TabList>
</Properties>
</ParagraphStyle>
<ParagraphStyle Self="ParagraphStyle/Paragraph" Name="Paragraph" LeftIndent="0">
<Properties>
<BasedOn type="object">$ID/NormalParagraphStyle</BasedOn>
</Properties>
</ParagraphStyle>
</RootParagraphStyleGroup>
<RootTableStyleGroup Self="pandoc_table_styles">
<TableStyle Self="TableStyle/Table" Name="Table" />
</RootTableStyleGroup>
<RootCellStyleGroup Self="pandoc_cell_styles">
<CellStyle Self="CellStyle/Cell" AppliedParagraphStyle="ParagraphStyle/$ID/[No paragraph style]" Name="Cell" />
</RootCellStyleGroup>
<Story Self="pandoc_story"
TrackChanges="false"
StoryTitle=""
AppliedTOCStyle="n"
AppliedNamedGrid="n" >
<StoryPreference OpticalMarginAlignment="true" OpticalMarginSize="12" />
<!-- body needs to be non-indented, otherwise code blocks are indented too far -->
<ParagraphStyleRange AppliedParagraphStyle="ParagraphStyle/Paragraph">
<CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
<Content>âTestâ</Content>
</CharacterStyleRange>
<CharacterStyleRange AppliedCharacterStyle="$ID/NormalCharacterStyle">
<Content> ⊠â</Content>
</CharacterStyleRange>
</ParagraphStyleRange>
</Story>
</Document>
What I want/expect is:
[...]
<Content>âTestâ</Content>
[...]
I can achieve achieve this by adding -f markdown-smart as an argument, but I'd rather keep the other fixes smart does.
Is this a planned feature (to have specific quotes for different languages in ICML output) or is the solution to use -smart?
@tetov - at this point, the solution is to use -smart.
Maybe some day we'll implement configurable smart quotes, but it's not a priority now.
In that case, I have a workaround for you, Anton:
pipe the text through sed 's/"/â/g' before putting it into pandoc. You're lucky that your desired quotes aren't symmetrical so you don't have to use anything "smart" in order to get them.
Be aware that Swedish has some other typesetting quirks like using spaced endashes â like this â rather than English-style non-spaced emdashesâlike thisâand there are some other weird things.
So perhaps it's best to either make sure your source document already has the typography you want (I sometimes use emacs smart-quotes-mode for this) or you run it through a quick little sed, perl, or tr filter before pandoc. Does that work?
@jgm Thanks, I understand!
@snan I thought about processing the text but didn't really know where to put that processings and the examples found looked daunting (which were with symmetrical quotes). Thanks! I'll add it before pandoc in my makefile.
I wasn't aware that those differences existed! Thanks a lot for pointing them out. I have some reading to do :).
snan notifications@github.com writes:
In that case, I have a workaround for you, Anton:
pipe the text throughsed 's/"/â/g'before putting it into pandoc. You're lucky that your desired quotes aren't symmetrical so you don't have to use anything "smart" in order to get them.
This will work fine unless you have straight quotes in
non-textual contexts: code, HTML attributes, titles in
markdown links.
In that case, you could achieve the same thing by
using a simple lua filter, in conjunction with -smart.
I'll need to spend some more time learning lua and lua-filters in order to get that to work. I've forked the lua-filters repo started to cobble together something from the existing samples.
In the meantime I made a hacky solution in my Makefile.
Thanks for your help, @jgm and @snan!
Edit: While working on adding single quotation marks as well as dashes I realized that I could run the sed commands on the output-file, like this:
sed -i -e 's/â/â/g' -e 's/â/â/g' output.icml
This gives me all of the benefits of smart will still keeping symmetrical quotation marks. Pandoc respects spaces around en-dashes so that is not a problem either.
@jgm:
This will work fine unless you have straight quotes in non-textual contexts: code, HTML attributes, titles in markdown links. In that case, you could achieve the same thing by using a simple lua filter, in conjunction with -smart.
Which runs first; the smart function or the lua-filter? I were thinking about putting the regexp in my edit above into a LUA-filter to make it work with any output format.
Smartification takes place at the parsing stage, so in
the filter you'll have Quoted objects you can replace.
Anton T Johansson notifications@github.com writes:
@jgm:
This will work fine unless you have straight quotes in non-textual contexts: code, HTML attributes, titles in markdown links. In that case, you could achieve the same thing by using a simple lua filter, in conjunction with -smart.
Which runs first; the smart function or the lua-filter? I were thinking about putting the regexp in my edit above into a LUA-filter to make it work with any output format.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/jgm/pandoc/issues/2620#issuecomment-453139046
I do similar preprocessing (with sed and similar tools) to change âąÂ bullets into hyphen bullets. Man, I wish âȘ would add that to markdown, that's the one thing I really miss from how I write plain text files.
Just wanted to chime in to say that localized smart quotes would be a fantastic feature to have.
As already said elsewhere, @Phyks suggestion to have a --french-quotes flag doesn't make much sense. Why pick just French when so many languages have their own quoting rules?
In my case, German uses â as opening and â as closing quotes. Being able to automate the conversion from straight to curly would be a tremendous boon and would help me enormously in the editorial work I do (mainly converting Markdown to HTML).
converting Markdown to HTML
then see https://github.com/jgm/pandoc/issues/2620#issuecomment-169099590
@mb21 Using the --html-q-tags flag would result in a <q> tag being used for everything between quotation marks. That would be wrong in a most cases, since that tag is used to mark up inline quotations, which is all but a small subset of my actual use cases. Beside being semantically incorrect, I just need clean HTML without any CSS.
Using proper German quotes in the input is what I already do â before converting the markdown with the --ascii flag to replace them with the corresponding HTML entities. I substitute manually every single straight quote in the drafts I receive from all over the place. It takes time, and thatâs the process Iâd like to automate.
As for using sed or perl to post-process the output, I didnât explore the possibility, but that would be probably the way to go, before this functionality gets hopefully baked into Pandoc.
@odkr wrote a great Lua filter to handle this problem:
https://github.com/odkr/pandoc-quotes.lua. It is now also available as part of the pandoc lua-filters collection: https://github.com/pandoc/lua-filters/tree/master/pandoc-quotes.lua
@odkr @tarleb That looks great. Thanks for bringing it to my attention.
Most helpful comment
Having a
--french-quoteis not a good idea since this is a very dedicated task. Having a--langoption is a better idea if you can extend a language map. Latex uses babel for that task.