Pandoc: Allow dynamic loading of syntax definitions and styles for highlighting

Created on 31 Dec 2016  Â·  22Comments  Â·  Source: jgm/pandoc

This is now possible, since we've switched from jgm/highlighting-kate to jgm/skylighting.

What would be needed:

  • [ ] Add a parameter to WriterOptions for a SyntaxMap, defaulting to defaultSyntaxMap
  • [ ] Add a --syntax-definition option to pandoc that reads an xml file and adds it to the syntax map. (Make sure to run the sanity check function from skylighting so we issue an error if a syntax is added without syntaxes it includes.)
  • [ ] Add a parameter for WriterOptions, or perhaps for a SyntaxMap, in the highlight function in Text.Pandoc.Highlighting.
  • [ ] Additional steps to add a --highlighting-theme option that affects writerHighlightStyle in WriterOptions. Or perhaps --style should be allowed to take either a style name or a filename.

This would require a bump of the second version number because of the API change.

We might also consider allowing dynamic loading of highlighting styles, using KDE theme JSON format. skylighting already makes this quite easy, since the FromJSON instance for Style is designed for this format.

API enhancement in-progress

Most helpful comment

+++ Wasif Hasan Baig [Feb 02 17 11:02 ]:

Not really. I am not really familiar with the codebase and it's huge.
Plus I can only work on this on weekends. I want to contribute to
Pandoc so I wish to complete it sooner or later. Do you have a deadline
in mind?

Not a specific deadline, but I'd like to get this
implemented by 2.0 release. Always glad to have new
contributors though!

Some hints:

For the second step, all the work will be in pandoc.hs, so
you can focus on that. For adding the new option, you can
look at the code for other options. (The MANUAL.txt will
also need updating with the new option.)

To read the xml file, you'll use parseSyntaxDefinition
from Skylighting.Parser. For the sanity check, you'll use
missingIncludes from the same module. To add the new
syntax definition to the syntax map, you'll use
addSyntaxDefinition.

For the third step, the work will be in
Text.Pandoc.Highlighting. Basically it's just a matter of
making the new syntax map available to the function that
does the highlighting conversion.

The fourth step could be a separate PR. I think I prefer
allowing --style to take a filename as argument (which would
then be parsed as a highlighting theme), as that avoids
having to worry about clashes between two separate options.
This would only require changes in pandoc.hs. You'd use
parseTheme from Skylighting.Styles.

All 22 comments

Is anyone already working on this?

Not that I know of!

@baig did you ask because you're planning to implement this feature?

Yes, you can assign it to me.

On 14 Jan 2017 1:22 a.m., "John MacFarlane" notifications@github.com
wrote:

@baig https://github.com/baig did you ask because you're planning to
implement this feature?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jgm/pandoc/issues/3334#issuecomment-272612682, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAOI103up8iHxo_4bjM6tPzmTfANUG-Gks5rSJPmgaJpZM4LYerG
.

@baig it's all yours.

  • [x] Add a parameter to WriterOptions for a SyntaxMap, defaulting to defaultSyntaxMap
  • [ ] Add a --syntax-definition option to pandoc that reads an xml file and adds it to the syntax map. (Make sure to run the sanity check function from skylighting so we issue an error if a syntax is added without syntaxes it includes.)
  • [ ] Add a parameter for WriterOptions, or perhaps for a SyntaxMap, in the highlight function in Text.Pandoc.Highlighting.
  • [ ] Additional steps to add a --highlighting-theme option that affects writerHighlightStyle in WriterOptions. Or perhaps --style should be allowed to take either a style name or a filename.

@baig are you stuck on this? Feel free to ask if you have questions, or let us know if you're not interested in implementing this any more.

Not really. I am not really familiar with the codebase and it's huge. Plus I can only work on this on weekends. I want to contribute to Pandoc so I wish to complete it sooner or later. Do you have a deadline in mind?

+++ Wasif Hasan Baig [Feb 02 17 11:02 ]:

Not really. I am not really familiar with the codebase and it's huge.
Plus I can only work on this on weekends. I want to contribute to
Pandoc so I wish to complete it sooner or later. Do you have a deadline
in mind?

Not a specific deadline, but I'd like to get this
implemented by 2.0 release. Always glad to have new
contributors though!

Some hints:

For the second step, all the work will be in pandoc.hs, so
you can focus on that. For adding the new option, you can
look at the code for other options. (The MANUAL.txt will
also need updating with the new option.)

To read the xml file, you'll use parseSyntaxDefinition
from Skylighting.Parser. For the sanity check, you'll use
missingIncludes from the same module. To add the new
syntax definition to the syntax map, you'll use
addSyntaxDefinition.

For the third step, the work will be in
Text.Pandoc.Highlighting. Basically it's just a matter of
making the new syntax map available to the function that
does the highlighting conversion.

The fourth step could be a separate PR. I think I prefer
allowing --style to take a filename as argument (which would
then be parsed as a highlighting theme), as that avoids
having to worry about clashes between two separate options.
This would only require changes in pandoc.hs. You'd use
parseTheme from Skylighting.Styles.

Adding writerSyntaxMaps to WriterOptions causes the following error on stack build because WriterOptions is deriving Show, Data, Typeable, and Generic instances whereas types in Skylighting.Types only derive the Show instance:

../pandoc/src/Text/Pandoc/Options.hs:407:21: error:
    • No instance for (Data Syntax)
        arising from the 45th field of ‘WriterOptions’ (type ‘SyntaxMap’)
      Possible fix:
        use a standalone 'deriving instance' declaration,
          so you can specify the instance context yourself
    • When deriving the instance for (Data WriterOptions)
Completed 2 action(s).

--  While building package pandoc-1.19.1 using:
      ../pandoc/.stack-work/dist/x86_64-linux/Cabal-1.24.0.0/setup/setup --builddir=.stack-work/dist/x86_64-linux/Cabal-1.24.0.0 build lib:pandoc exe:pandoc --ghc-options " -ddump-hi -ddump-to-file"
    Process exited with code: ExitFailure 1

I kept adding Data, Typeable, and Generic in the derive clause for the relevant data types in Skylighting.Types and Skylighting.Regex but got stuck at RE. There is no Show, Data, Typeable, and Generic for Regex used in defining RE and it's being imported from an external package regex-pcre.

Any advice how to deal with this?

I can't think of anywhere where we're relying on Data,
Typeable, and Generic instances for Writer/ReaderOptions.
(This could be tested by removing these istances and
seeing if pandoc and pandoc-citeproc compile.)

So one option would be just to remove them. Of course,
it's always possible that someone else is relying on them.

Ramifications of the change:

  • we'd lose the ability to serialize WriterOptions to JSON
  • we'd lose the ability to walk WriterOptions with generic
    transformations
  • others?

I'm not sure about this, but I don't know what options we
have. I suppose another option would be to pass the XML
syntax definitions in WriterOptions, but this seems
inefficient; if you were doing a number of conversions,
pandoc would be parsing the XML files on each conversion
instead of only once.

So pandoc and pandoc-citeproc both compiles after removing Data from the deriving clause of Writer/ReaderOptions. I can't really assess the project-wide ramifications of such a decision, so I'll have to go by what you think is best to do in this situation.

I've posted a message on pandoc-discuss asking if anyone relies on these instances.

Nobody has said yet that they're relying on these instances, so go ahead and remove them.
(Make sure this is documented in the commit message, as this will definitely need to go in the changelog.)

Sure, I'll do that and document it.

Hold on -- I'm working on a revision to skylighting that should allow us to keep these instances in Pandoc.

OK, that would be neat.

Release skylighting 0.2.

@baig are you still planning to implement this?
If not, I can do it.

Yes. There's not much work left. Just couldn't find time to work on this for the last two weekends. When are you planning to release 2.0?

+++ Wasif Hasan Baig [Mar 06 17 20:14 ]:

Yes. There's not much work left. Just couldn't find time to work on
this for the last two weekends. When are you planning to release 2.0?

No specific plans still. If you think you'll get to it in
the next couple weeks, should be fine.

I needed this feature and can't wait any longer, so I implemented it.

Still TODO:

  • [x] Figure out whether we want to have the xml parsing
    depend on language.dtd (it currently does, and fails unless
    the language.dtd is found in the same directory).
  • [x] Add an option to read a KDE syntax highlighting theme
    as a custom style. (Use parseTheme :: ByteString -> Either String Style from Skylighting.Styles)
  • [ ] Add tests.
Was this page helpful?
0 / 5 - 0 ratings

Related issues

anton-k picture anton-k  Â·  53Comments

kevinushey picture kevinushey  Â·  79Comments

jgm picture jgm  Â·  266Comments

ERnsTL picture ERnsTL  Â·  58Comments

brainchild0 picture brainchild0  Â·  66Comments