Pandoc: Code in filters.md needs an update?

Created on 9 Jul 2018  Â·  12Comments  Â·  Source: jgm/pandoc

Am I correct in thinking that the code in the file filters.md needs an update? I am quite new to Haskell and Pandoc, but it seems to me that the newer versions of Pandoc are more monadic and use Data.Text.Text instead of String, so that the code in filters.md doesn't work.

For example, the code for extracturl.hs doesn't work for me. After I changed it to the following, it worked:

{-# LANGUAGE OverloadedStrings #-}
-- extracturls.hs
import Text.Pandoc
import Text.Pandoc.Walk (query)
import Data.Text as T (Text, pack, unlines)
import Data.Text.IO as TI (interact, getContents, putStr)
import Prelude

extractURL :: Inline -> [Text]
extractURL (Link _ _ (u,_)) = [pack u]
extractURL (Image _ _ (u,_)) = [pack u]
extractURL _ = []

extractURLs :: Pandoc -> [Text]
extractURLs = query extractURL

readDoc :: Text -> PandocIO Pandoc
readDoc = readMarkdown def
-- or, for pandoc 1.14, use:
-- readDoc s = case readMarkdown def s of
--                Right doc -> doc
--                Left err  -> error (show err)

main :: IO ()
main = TI.putStr =<< runIOorExplode . ((fmap T.unlines) . (fmap extractURLs) . readDoc) =<< TI.getContents

Diff:

0a1
> {-# LANGUAGE OverloadedStrings #-}
2a4,7
> import Text.Pandoc.Walk (query)
> import Data.Text as T (Text, pack, unlines)
> import Data.Text.IO as TI (interact, getContents, putStr)
> import Prelude
4,6c9,11
< extractURL :: Inline -> [String]
< extractURL (Link _ _ (u,_)) = [u]
< extractURL (Image _ _ (u,_)) = [u]
---
> extractURL :: Inline -> [Text]
> extractURL (Link _ _ (u,_)) = [pack u]
> extractURL (Image _ _ (u,_)) = [pack u]
9c14
< extractURLs :: Pandoc -> [String]
---
> extractURLs :: Pandoc -> [Text]
12c17
< readDoc :: String -> Pandoc
---
> readDoc :: Text -> PandocIO Pandoc
20c25
< main = interact (unlines . extractURLs . readDoc)
---
> main = TI.putStr =<< runIOorExplode . ((fmap T.unlines) . (fmap extractURLs) . readDoc) =<< TI.getContents

(The code for behead.hs had similar issues.)

Using pandoc 2.2.1, compiled with pandoc-types 1.17.5.1.

Most helpful comment

{-# LANGUAGE OverloadedStrings #-}
-- handleruby.hs
import Text.Pandoc.JSON
import System.Environment (getArgs)
import qualified Data.Text as T

handleRuby :: Maybe Format -> Inline -> Inline
handleRuby (Just format) x@(Link attr [Str ruby] (src,_)) =
  case T.uncons src of
    Just ('-',kanji)
      | format == Format "html" -> RawInline format $
        "<ruby>" <> kanji <> "<rp>(</rp><rt>" <> ruby <>
        "</rt><rp>)</rp></ruby>"
      | format == Format "latex" -> RawInline format $
        "\\ruby{" <> kanji <> "}{" <> ruby <> "}"
      | otherwise -> Str ruby
    _ -> x
handleRuby _ x = x

main :: IO ()
main = toJSONFilter handleRuby

All 12 comments

Correct. This should be updated (though perhaps we
need at least a note for people using the older API).

Nielius notifications@github.com writes:

Am I correct in thinking that the code in the file filters.md needs an update? I am quite new to Haskell and Pandoc, but it seems to me that the newer versions of Pandoc are more monadic and use Data.Text.Text instead of String, so that the code in filters.md doesn't work.

For example, the code for extracturl.hs doesn't work for me. After I changed it to the following, it worked:

{-# LANGUAGE OverloadedStrings #-}
-- extracturls.hs
import Text.Pandoc
import Text.Pandoc.Walk (query)
import Data.Text as T (Text, pack, unlines)
import Data.Text.IO as TI (interact, getContents, putStr)
import Prelude

extractURL :: Inline -> [Text]
extractURL (Link _ _ (u,_)) = [pack u]
extractURL (Image _ _ (u,_)) = [pack u]
extractURL _ = []

extractURLs :: Pandoc -> [Text]
extractURLs = query extractURL

readDoc :: Text -> PandocIO Pandoc
readDoc = readMarkdown def
-- or, for pandoc 1.14, use:
-- readDoc s = case readMarkdown def s of
--                Right doc -> doc
--                Left err  -> error (show err)

main :: IO ()
main = TI.putStr =<< runIOorExplode . ((fmap T.unlines) . (fmap extractURLs) . readDoc) =<< TI.getContents

Diff:

0a1
> {-# LANGUAGE OverloadedStrings #-}
2a4,7
> import Text.Pandoc.Walk (query)
> import Data.Text as T (Text, pack, unlines)
> import Data.Text.IO as TI (interact, getContents, putStr)
> import Prelude
4,6c9,11
< extractURL :: Inline -> [String]
< extractURL (Link _ _ (u,_)) = [u]
< extractURL (Image _ _ (u,_)) = [u]
---
> extractURL :: Inline -> [Text]
> extractURL (Link _ _ (u,_)) = [pack u]
> extractURL (Image _ _ (u,_)) = [pack u]
9c14
< extractURLs :: Pandoc -> [String]
---
> extractURLs :: Pandoc -> [Text]
12c17
< readDoc :: String -> Pandoc
---
> readDoc :: Text -> PandocIO Pandoc
20c25
< main = interact (unlines . extractURLs . readDoc)
---
> main = TI.putStr =<< runIOorExplode . ((fmap T.unlines) . (fmap extractURLs) . readDoc) =<< TI.getContents

(The code for behead.hs had similar issues.)

Using pandoc 2.2.1, compiled with pandoc-types 1.17.5.1.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jgm/pandoc-website/issues/24

I also tried to play around with the A filter for ruby text and have no idea how to fix it since I've got no haskell experience.

The current example is:

-- handleruby.hs
import Text.Pandoc.JSON
import System.Environment (getArgs)

handleRuby :: Maybe Format -> Inline -> Inline
handleRuby (Just format) (Link [Str ruby] ('-':kanji,_))
  | format == Format "html"  = RawInline format
    $ "<ruby>" ++ kanji ++ "<rp>(</rp><rt>" ++ ruby ++ "</rt><rp>)</rp></ruby>"
  | format == Format "latex" = RawInline format
    $ "\\ruby{" ++ kanji ++ "}{" ++ ruby ++ "}"
  | otherwise = Str ruby
handleRuby _ x = x

main :: IO ()
main = toJSONFilter handleRuby

Would really appreciate it if someone posted an updated version of the filter.

Change the ++ to <> and that will probably be enough to make it work.

Here's the output after changing it:


output

Stack has not been tested with GHC versions above 8.6, and using 8.8.2, this may fail
Stack has not been tested with Cabal versions above 2.4, but version 3.0.1.0 was found, this may fail

handleruby.hs:6:46: error:
    • Couldn't match expected type ‘Data.Text.Internal.Text’
                  with actual type ‘[Char]’
    • In the pattern: '-' : kanji
      In the pattern: ('-' : kanji, _)
      In the pattern: Link _ [Str ruby] ('-' : kanji, _)
  |
6 | handleRuby (Just format) (Link _ [Str ruby] ('-':kanji,_))
  |                                              ^^^^^^^^^

handleruby.hs:7:22: error:
    • Couldn't match expected type ‘Data.Text.Internal.Text’
                  with actual type ‘[Char]’
    • In the first argument of ‘Format’, namely ‘"html"’
      In the second argument of ‘(==)’, namely ‘Format "html"’
      In the expression: format == Format "html"
  |
7 |   | format == Format "html"  = RawInline format
  |                      ^^^^^^

handleruby.hs:8:7: error:
    • Couldn't match expected type ‘Data.Text.Internal.Text’
                  with actual type ‘[Char]’
    • In the second argument of ‘($)’, namely
        ‘"<ruby>"
           <> kanji <> "<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"’
      In the expression:
        RawInline format
          $ "<ruby>"
              <> kanji <> "<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"
      In an equation for ‘handleRuby’:
          handleRuby (Just format) (Link _ [Str ruby] ('-' : kanji, _))
            | format == Format "html"
            = RawInline format
                $ "<ruby>"
                    <> kanji <> "<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"
            | format == Format "latex"
            = RawInline format $ "\\ruby{" <> kanji <> "}{" <> ruby <> "}"
            | otherwise = Str ruby
  |
8 |     $ "<ruby>" <> kanji <> "<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"
  |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

handleruby.hs:8:48: error:
    • Couldn't match expected type ‘[Char]’
                  with actual type ‘Data.Text.Internal.Text’
    • In the second argument of ‘(<>)’, namely
        ‘ruby <> "</rt><rp>)</rp></ruby>"’
      In the second argument of ‘(<>)’, namely
        ‘"<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"’
      In the second argument of ‘(<>)’, namely
        ‘kanji <> "<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"’
  |
8 |     $ "<ruby>" <> kanji <> "<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"
  |                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

handleruby.hs:8:56: error:
    • Couldn't match expected type ‘Data.Text.Internal.Text’
                  with actual type ‘[Char]’
    • In the second argument of ‘(<>)’, namely
        ‘"</rt><rp>)</rp></ruby>"’
      In the second argument of ‘(<>)’, namely
        ‘ruby <> "</rt><rp>)</rp></ruby>"’
      In the second argument of ‘(<>)’, namely
        ‘"<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"’
  |
8 |     $ "<ruby>" <> kanji <> "<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"
  |                                                        ^^^^^^^^^^^^^^^^^^^^^^^^

handleruby.hs:9:22: error:
    • Couldn't match expected type ‘Data.Text.Internal.Text’
                  with actual type ‘[Char]’
    • In the first argument of ‘Format’, namely ‘"latex"’
      In the second argument of ‘(==)’, namely ‘Format "latex"’
      In the expression: format == Format "latex"
  |
9 |   | format == Format "latex" = RawInline format
  |                      ^^^^^^^

handleruby.hs:10:7: error:
    • Couldn't match expected type ‘Data.Text.Internal.Text’
                  with actual type ‘[Char]’
    • In the second argument of ‘($)’, namely
        ‘"\\ruby{" <> kanji <> "}{" <> ruby <> "}"’
      In the expression:
        RawInline format $ "\\ruby{" <> kanji <> "}{" <> ruby <> "}"
      In an equation for ‘handleRuby’:
          handleRuby (Just format) (Link _ [Str ruby] ('-' : kanji, _))
            | format == Format "html"
            = RawInline format
                $ "<ruby>"
                    <> kanji <> "<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"
            | format == Format "latex"
            = RawInline format $ "\\ruby{" <> kanji <> "}{" <> ruby <> "}"
            | otherwise = Str ruby
   |
10 |     $ "\\ruby{" <> kanji <> "}{" <> ruby <> "}"
   |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

handleruby.hs:10:37: error:
    • Couldn't match expected type ‘[Char]’
                  with actual type ‘Data.Text.Internal.Text’
    • In the second argument of ‘(<>)’, namely ‘ruby <> "}"’
      In the second argument of ‘(<>)’, namely ‘"}{" <> ruby <> "}"’
      In the second argument of ‘(<>)’, namely
        ‘kanji <> "}{" <> ruby <> "}"’
   |
10 |     $ "\\ruby{" <> kanji <> "}{" <> ruby <> "}"
   |                                     ^^^^^^^^^^^

handleruby.hs:10:45: error:
    • Couldn't match expected type ‘Data.Text.Internal.Text’
                  with actual type ‘[Char]’
    • In the second argument of ‘(<>)’, namely ‘"}"’
      In the second argument of ‘(<>)’, namely ‘ruby <> "}"’
      In the second argument of ‘(<>)’, namely ‘"}{" <> ruby <> "}"’
   |
10 |     $ "\\ruby{" <> kanji <> "}{" <> ruby <> "}"
   |                                             ^^^

I really appreciate the help.

Add {-# LANGUAGE OverloadedStrings #-} to the top of the source file.

Change '-':kanji to "-" <> kanji.

handleruby.hs:8:46: error: Parse error in pattern: "-" <> kanji
  |
8 | handleRuby (Just format) (Link _ [Str ruby] ("-" <> kanji,_))
  |             

Ah, right, it's a pattern-matching context.
Add

import qualified Data.Text as T

in the imports, and change:

handleRuby (Just format) (Link [Str ruby] (src,_))
  | T.take 1 src == "-", format == Format "html"  = RawInline format
    $ "<ruby>" <> T.drop 1 src <> "<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"
  | T.take 1 src == "-", format == Format "latex" = RawInline format
    $ "\\ruby{" <> T.drop 1 src <> "}{" <> ruby <> "}"
  | otherwise = Str ruby


output

Stack has not been tested with GHC versions above 8.6, and using 8.8.2, this may fail
Stack has not been tested with Cabal versions above 2.4, but version 3.0.1.0 was found, this may fail

handleruby.hs:9:27: error:
    • The constructor ‘Link’ should have 3 arguments, but has been given 2
    • In the pattern: Link [Str ruby] (src, _)
      In an equation for ‘handleRuby’:
          handleRuby (Just format) (Link [Str ruby] (src, _))
            | T.take 1 src == "-", format == Format "html"
            = RawInline format
                $ "<ruby>"
                    <>
                      T.drop 1 src
                        <> "<rp>(</rp><rt>" <> ruby <> "</rt><rp>)</rp></ruby>"
            | T.take 1 src == "-", format == Format "latex"
            = RawInline format
                $ "\\ruby{" <> T.drop 1 src <> "}{" <> ruby <> "}"
            | otherwise = Str ruby
  |
9 | handleRuby (Just format) (Link [Str ruby] (src,_))
  |                           ^^^^^^^^^^^^^^^^^^^^^^^

{-# LANGUAGE OverloadedStrings #-}
-- handleruby.hs
import Text.Pandoc.JSON
import System.Environment (getArgs)
import qualified Data.Text as T

handleRuby :: Maybe Format -> Inline -> Inline
handleRuby (Just format) x@(Link attr [Str ruby] (src,_)) =
  case T.uncons src of
    Just ('-',kanji)
      | format == Format "html" -> RawInline format $
        "<ruby>" <> kanji <> "<rp>(</rp><rt>" <> ruby <>
        "</rt><rp>)</rp></ruby>"
      | format == Format "latex" -> RawInline format $
        "\\ruby{" <> kanji <> "}{" <> ruby <> "}"
      | otherwise -> Str ruby
    _ -> x
handleRuby _ x = x

main :: IO ()
main = toJSONFilter handleRuby

Thanks! Now I have even greater respect for haskell programmers ^^.
If someone makes a pr to update the website example, they should probably add that following needs to be done to make it work:

  • compile with --pdf-engine=xelatex (for kanji support)
  • add this header to every file using the script:
mainfont: Noto Sans CJK TC
header-includes: |
   \usepackage{ruby}
---

or use a template that includes the ruby package ofc.

I really appreciate the community help, it's truly wonderful IMO, thanks a lot!

edit:
I also needed to add \renewcommand{\rubysep}{0ex} at the start to fix the ruby text alignment.

We should also add a lua filter version of this to the lua-filters documentation or the pandoc/lua-filters repository; it would be easier to get going than the Haskell one.

Transfer to pandoc repository, because filters.md lives there and just gets copied over to the website.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dashed picture dashed  Â·  107Comments

matthijskooijman picture matthijskooijman  Â·  54Comments

jgm picture jgm  Â·  62Comments

elliottslaughter picture elliottslaughter  Â·  44Comments

GeraldLoeffler picture GeraldLoeffler  Â·  143Comments