Cabal: Support automatic population of `exposed-modules:`

Created on 22 Aug 2020  路  21Comments  路  Source: haskell/cabal

I am new to Haskell, and the very first thing I wondered about when I started using cabal-install was the need to manually add modules to the exposed-modules: fields in .cabal files.

I think it would be a good idea if this is done by the tool.

I recently looked into hpack and I asked over here on reddit if it was worth it. and one of the reasons people give is that with hpack the exposed-modules: is populated for you.

I think it will be nice if this is a native feature instead of something an external tool like hpack helps with

Most helpful comment

stack build is not just fine. It preprocesses the package.yaml file (into pkgname.cabal), it doesn't interpret it directly. That is very important to understand. Package tarballs uploaded to Hackage with stack still have "old-style, explicit module listing pkgname.cabal" in them.

I'm not against preprocessing, cabal-fmt does this. But how to make the UX so users are not confused if that functionality was part of cabal and implicitly done on builds? I don't know. (I don't think it will be non-confusing if it's implicit).

Also Herbert is rightfully concerned about performance. Note, it's not only that cabal-install would need to do file system watching (which it already does, so it's not an issue). But also Cabal the library would need to traverse a filesystem every time ./Setup is invoked, if ./Setup.hs is also preprocessing as well. (E.g. in Nix, or GHCs build systems). I think it shouldn't, Setup is already complicated interface.


If you simply don't care about users opinions with regards to UX issues, that's pretty alarming

Not true. I do listen. And we do implement features to improve users experiences. Judging by response to a single request is just not fair.

All 21 comments

I believe this to be a duplicate but I can't seem to find the previous
tickets on this topic.

The short gist is, that the exposed list of modules is a critical piece
of information of a package and ought to be fully intentional, so
autodetection-via-filesystem runs the risk of including unintended
"garbage" that happened to lay around in a filesystem; so autodetecting
files rather than intentionally statically enumerating them isn't
without issue as well. GHC was taught to warn about undocumented modules
(-Whome-missing-modules) to help with that (and autodetecting modules
from the fs would defeat the purpose of -Whome-missing-modules
again). This is a bit of a philosophical issue, and whether you consider
statically determined APIs more robust or are fine with
dynamically-via-filesystem-index-inferred APIs.

There's also the technical minor benefit that tracking changes to the
filesystem directory index often requires a full recursive traversal
each time to be on the safe side. But more importantly, we have tooling
that has only access to the .cabal files in the 01-index.tar and needs
to know the set of exposed modules (including how they're affected by
cabal package flags) and for which it would be too expensive to have
to download and inspect the actual source-tarball; in fact it would
kinda defeat the purpose of a package index if it lacks such
essential package-level information.

So at the very least for packages that end up in a package index,
automatic population is not something that's sensible to do. However,
there's no reason we can't have tooling which is able to sync your
filesystem to the module list in your existing .cabal file. This way
you'd explicitly track the module manifest in your .cabal file and you'd
make changes to the manifest more explicit than merely by what
filenames happen to be in a folder, and you reduce the busy-work of
manually having to sync that list by hand if this is something that
causes you overhead. This would provide us best of both worlds IMO. A
proof-of-concept for such a tool would easily be hackable in a
single weekend; it can be easily prototyped outside of cabal proper and
if deemed convenient enough could be integrated into cabal proper.

ought to be fully intentional

It's just annoying to do. I use hpack to avoid having to do this. If cabal were to provide an (opt in) feature to do this, I'd drop hpack like a rock.

autodetection-via-filesystem runs the risk of including unintended "garbage"

I'm aware of this, but nearly every langauge does it like this. Opt in functionality for this would be great.

However, there's no reason we can't have tooling which is able to sync your filesystem to the module list in your existing .cabal file.

How about instead we add a hidden-modules field that enables auto detection and doesn't work with exposed-modules. That way I don't have to run any more commands and cabal just works.
That would be the opt-in part. I can still hide unintended packages by adding them to hidden-modules. The risk is that I accidently upload a package to hackage with too many modules exposed, but that's a risk well worth it compared to the continued annoyence of having to update that module list, or having to run hpack.
It only takes a version upgrade to fix it, no big deal.

On a side note, I'm frustrated enough by this to implement this myself. If I'm reasonably sure this would be accepted as a change.

If I read the docs correctly, cabal init already does something similar. Could there be another command to update the information?

Just mentioning http://oleg.fi/gists/posts/2019-08-11-cabal-fmt.html#extra-expand-exposed-modules-and-other-modules which is a nice tool that can be used for it, too.

If you want this behavior but don't want to use hpack, you could try using autopack.

If you want this behavior but don't want to use hpack, you could try using autopack.

Hmm. it will rather be better this be a feature native to the build tool, instead of relying on another 3rd party tool...

The short gist is, that the exposed list of modules is a critical piece of information of a package and ought to be fully intentional, so autodetection-via-filesystem runs the risk of including unintended "garbage" that happened to lay around in a filesystem

This won't be including files from any random directory though. The directory that contains exposed module would have to be explicitly specified before modules get picked from there. I think having to do that ticks the _intentionality_ box.

Just mentioning http://oleg.fi/gists/posts/2019-08-11-cabal-fmt.html#extra-expand-exposed-modules-and-other-modules which is a nice tool that can be used for it, too.

Thanks! I totally forgot about @phadej's tool already having implemented it!

I'll just quote it here again in the hopes Github's indexing makes this ticket more discoverable (turns out cabal-fmt was already mentioned in the related issue https://github.com/haskell/cabal/issues/5343#issuecomment-520140470):


expand exposed-modules and other-modules

The recent addition is an ability to (re)write field contents, while formatting. There's an old, ongoing discussion of allowing wildcard specification of exposed-modules in .cabal format. I'm against that change. Instead, rather cabal-fmt (or an imaginary IDE), would regenerate parts of .cabal file given some commands.

cabal-fmt: expand <directory> is a one (the only at the moment) such command.

cabal-fmt will look into directory for files, turn filenames into module names and append to the contents of exposed-modules. As the field is then nubbed and sorted, expanding is idempotent. For example cabal-fmt itself has:

-- cabal-fmt: expand src
--
exposed-modules:
  CabalFmt
  ...

The functionality is simple. There is no removal of other-modules or main-is. I think that using different directory for these is good enough workaround, and may make things clearer: directory for public modules and a directory for private ones.


I can sympathise with the desire to throw in everything and the kitchen sink into cabal proper; but unfortunately every single feature added to a big project is one that increases our maintenance surface. It's not necessarily a perfect comparison but GHC is plagued by a similar issue, see https://osa1.net/posts/2020-01-22-no-small-syntax-extensions.html which tells the cautionary tale of the cost associated with trivial syntax extensions such as "BlockArguments" which only time will tell if people will actually use it to justify its inclusion (and as the blogpost points out, it's unlikely to be used in professional environments due to the cognitive overhead involved)... but I digress :-)

That being said, I'm not saying that cabal-fmt feature might not end up in cabal proper, especially if it turns out to be low-risk and there's an obvious canonical logic to its behaviour without many knobs and buttons to consider (which would risk feature creeping). So by all means, please try out cabal-fmt and tell us if there's things you'd like to tweak/modify/improve.

The previous issue for this is #5343.

I think that the crux of this issue is that automatically discovering exposed-modules is nice for humans but not nice for machines. Any solution is going to have to grapple with that.

hpack works great for local development because it finds exposed modules for you. But it also works great for publishing to Hackage because it produces a *.cabal file with all the exposed modules explicitly listed out.

autopack only really works for local development. If you uploaded a package that used autopack to Hackage, it would appear to expose no modules at all.

cabal-fmt is great for publishing to Hackage since it produces a typical *.cabal file. It's slightly suboptimal for local development because it's not integrated with any build tools. But integrating it yourself isn't terribly hard.

is nice for humans but not nice for machines.

I'd argue that you can change machines, humans will keep on complaining about this untill someone provides a good first class solution.

But it also works great for publishing to Hackage because it produces a

So under machines that expect this functionality, it's currently the cabal project, and hackage, are there any other projects that expect this? I'm trying to discover how hard it would be to make something like this as a first class feature, rather then providing the Nth workaround.

I can sympathise with the desire to throw in everything and the kitchen sink into cabal proper; but unfortunately every single feature added to a big project is one that increases our maintenance surface.

I don't think this ticket is a very strange feature to have for a build tool. Practically all other langauge build tools do this. Why is cabal so special?

I think that the crux of this issue is that automatically discovering exposed-modules is nice for humans but not nice for machines. Any solution is going to have to grapple with that.

What if the module discovery wasn't automatic but had to explicitly be invoked? E.g. if I ran cabal generate-modules and the tool updated the exposed-modules: list in the cabal file? I could then even run git diff project.cabal to verify that the module list generation worked exactly as intended.

In general, I don't think flipping the way that exposed-modules: works to instead be hidden-modules: is necessary to allay this pain. I think it's fine if exposed-modules: still has to list everything, it would just be nice if cabal-the-utility helped manage that list for you.

Edit: I think this is essentially what @hvr is describing in the last paragraph of his first response.

What if the module discovery wasn't automatic but had to explicitly be invoked?

It's an improvement, but I just want to make it so I almost never have to see that warning -Whome-missing-modules. I also hope to make the cabal file just smaller. Listing all modules is tedious and doesn't give a good overview.

In general, I don't think flipping the way that exposed-modules: works to instead be hidden-modules: is necessary to allay this pain. I think it's fine if exposed-modules: still has to list everything, it would just be nice if cabal-the-utility helped manage that list for you.

Maybe forget about this: My idea with that was that in most cases you just want to expose everything, so the cabal file becomes a lot shorter and more managable. But that other issue does this as well, just through a better approach (globbing). I think it would be appreciated as a PR (from what I understand) so I'll work on that in the near future. But please let me know if there are objections, I don't want to waste my time.

But that other issue does this as well, just through a better approach (globbing). I think it would be appreciated as a PR (from what I understand) so I'll work on that in the near future. But please let me know if there are objections, I don't want to waste my time.

I think the globbing approach suggested in #5343 would be a good one. Would probably be even better than running a separate cabal command to populate the exposed-modules list.

Just to echo, this is the key issue preventing me from switching to cabal files. My current project has ~400 modules. Haskell already has a lot friction compared to other languages such as manually managing imports and I don't want to add more.

Why not use cabal-fmt? The issue is I need to build tooling wrapping cabal/stack to call this with every operation to make it seamless. This adds enough pain to make it not worth it IMO.

I think there are valid arguments as to why this feature may not want to be integrated into cabal the tool. I won't claim to understand the cabal side of things. However, adding it could unlock a lot of new cabal users which I think is quite valuable for cabal (more users = more issues + PRs coming in hopefully). As such I would weight adding it quite highly.

It was brought to my attention on reddit that even though this was closed, at least @phadej thinks the globbing solution is worth putting into Cabal if someone implements it.

My takeaway from this being closed was there wasn't interest in fixing this issue, so I post this so others don't make my same mistake and see there is a way forward 馃檪

@codygman you understood me wrong. I expect the patch being quite big and invasive. Yet if someone actually makes it and still thinks it's a good idea, I'd be curious to know.

I do not recommend to pursuing that path to the end, the patch will probably be still rejected.

Case-in-point, just now I used @hvr's tool to find out which package exports Data.RFC5051. It saved me a lot of time while I fix Hackage metadata (going through pandoc dependencies by hand is not a human job).

There are nice things which would be nice to have, but we cannot.

Wouldn't it be nice if cabal figured out dependencies without explicit build-depends. In theory possible, in practice not. This is issue is the same. Reddit polls won't change my mind.

I said or elsewhere, but I want to preface this comment with:

You guys do a huge amount of work as volunteers and I appreciate it. I appreciate it so much I want others to be able to come back to Cabal from stack so even more can appreciate that work. I'd prefer not to have to say "don't use the default build tool cabal, you have an easier time with stack" when introducing new people to Haskell.

Technically I suspect cabal is better in many ways, but UX wise in a lot of ways it's just not and I can provide many examples if there is interest.

Reddit polls won't change my mind.

They should carry some weight if you care about users opinions. If the problem is the poll is on reddit, what sort of poll would convince your? If you simply don't care about users opinions with regards to UX issues, that's pretty alarming. Please confirm that isn't the case?

There are nice things which would be nice to have, but we cannot.

I mean, we can have it. I and others would just like to not resort to "switch to stack" or "use some external tool".

In theory possible, in practice not.

From a user perspective, that's hard to buy since stack build ends up doing it just fine. There are differences there and technical arguments below the surface, but at the end of the day people will judge cabal against stack.

stack build is not just fine. It preprocesses the package.yaml file (into pkgname.cabal), it doesn't interpret it directly. That is very important to understand. Package tarballs uploaded to Hackage with stack still have "old-style, explicit module listing pkgname.cabal" in them.

I'm not against preprocessing, cabal-fmt does this. But how to make the UX so users are not confused if that functionality was part of cabal and implicitly done on builds? I don't know. (I don't think it will be non-confusing if it's implicit).

Also Herbert is rightfully concerned about performance. Note, it's not only that cabal-install would need to do file system watching (which it already does, so it's not an issue). But also Cabal the library would need to traverse a filesystem every time ./Setup is invoked, if ./Setup.hs is also preprocessing as well. (E.g. in Nix, or GHCs build systems). I think it shouldn't, Setup is already complicated interface.


If you simply don't care about users opinions with regards to UX issues, that's pretty alarming

Not true. I do listen. And we do implement features to improve users experiences. Judging by response to a single request is just not fair.

Just as a datapoint in a discussion: even Michael Snoyman stated recently, that supporting automatic generation of cabal files from package.yaml in stack projects (using hpack) led to all sorts of build reproducibility issues. See https://www.fpcomplete.com/blog/storing-generated-cabal-files/

Was this page helpful?
0 / 5 - 0 ratings