Cabal: Rewrite the cabal file parser using parsec

Created on 12 Oct 2015 · 4Comments · Source: haskell/cabal

The existing parser has no formal grammar, has terrible error messages and is sometimes very slow and memory hungry. In large part this is because ReadP is a terrible parser. Now that the ghc library no longer depends on the Cabal library, Cabal is now allowed to depend on parsec.

The Cabal parser is two-stage, outline then individual fields. This approach will remain. There is a prototype new parser (with a grammar!) using parsec for the outline phase which has had some significant testing (the new grammar only rejects a very small number of old quirky .cabal files). This new parser needs to be integrated and tested further. Additionally the infrastructure for parsing the individual fields needs to be rewritten to use parsec.

parser

Source

phadej

Most helpful comment

@23Skidoo, briefly the next steps are

Setup parsec (mtl and text) into GHC tree
Rip-off non-parsec stuff from Cabal
...
Profit

But I don't want to bother @bgamari on this before GHC-8.2.1 is out.

phadej on 10 Jun 2017

👍3

All 4 comments

This ticket should be either closed or updated to reflect the current state of affairs.

23Skidoo on 8 Jun 2017

@23Skidoo, briefly the next steps are

Setup parsec (mtl and text) into GHC tree
Rip-off non-parsec stuff from Cabal
...
Profit

But I don't want to bother @bgamari on this before GHC-8.2.1 is out.

phadej on 10 Jun 2017

👍3

... and then upgrade to Megaparsec? :)

(Sorry, couldn't resist.)

BardurArantsson on 10 Jun 2017

@BardurArantsson that's a fair point :-) the main reason to stick with parsec right now is IMO that its dependency footprint is lighter, and given it's going to become a GHC bundled library, it's a good thing that parsec is so mature by now that its development has slowed down (while megaparsec is actively evolving). ;-)