Cabal: Support automatic population of `exposed-modules:`

Created on 22 Aug 2020 · 21Comments · Source: haskell/cabal

I am new to Haskell, and the very first thing I wondered about when I started using cabal-install was the need to manually add modules to the exposed-modules: fields in .cabal files.

I think it would be a good idea if this is done by the tool.

I recently looked into hpack and I asked over here on reddit if it was worth it. and one of the reasons people give is that with hpack the exposed-modules: is populated for you.

I think it will be nice if this is a native feature instead of something an external tool like hpack helps with

Source

finlaydotb

❤1

Most helpful comment

stack build is not just fine. It preprocesses the package.yaml file (into pkgname.cabal), it doesn't interpret it directly. That is very important to understand. Package tarballs uploaded to Hackage with stack still have "old-style, explicit module listing pkgname.cabal" in them.

I'm not against preprocessing, cabal-fmt does this. But how to make the UX so users are not confused if that functionality was part of cabal and implicitly done on builds? I don't know. (I don't think it will be non-confusing if it's implicit).

Also Herbert is rightfully concerned about performance. Note, it's not only that cabal-install would need to do file system watching (which it already does, so it's not an issue). But also Cabal the library would need to traverse a filesystem every time ./Setup is invoked, if ./Setup.hs is also preprocessing as well. (E.g. in Nix, or GHCs build systems). I think it shouldn't, Setup is already complicated interface.

If you simply don't care about users opinions with regards to UX issues, that's pretty alarming

Not true. I do listen. And we do implement features to improve users experiences. Judging by response to a single request is just not fair.

phadej on 31 Aug 2020

👍5

All 21 comments

I believe this to be a duplicate but I can't seem to find the previous
tickets on this topic.

The short gist is, that the exposed list of modules is a critical piece
of information of a package and ought to be fully intentional, so
autodetection-via-filesystem runs the risk of including unintended
"garbage" that happened to lay around in a filesystem; so autodetecting
files rather than intentionally statically enumerating them isn't
without issue as well. GHC was taught to warn about undocumented modules
(-Whome-missing-modules) to help with that (and autodetecting modules
from the fs would defeat the purpose of -Whome-missing-modules
again). This is a bit of a philosophical issue, and whether you consider
statically determined APIs more robust or are fine with
dynamically-via-filesystem-index-inferred APIs.

There's also the technical minor benefit that tracking changes to the
filesystem directory index often requires a full recursive traversal
each time to be on the safe side. But more importantly, we have tooling
that has only access to the .cabal files in the 01-index.tar and needs
to know the set of exposed modules (including how they're affected by
cabal package flags) and for which it would be too expensive to have
to download and inspect the actual source-tarball; in fact it would
kinda defeat the purpose of a package index if it lacks such
essential package-level information.

So at the very least for packages that end up in a package index,
automatic population is not something that's sensible to do. However,
there's no reason we can't have tooling which is able to sync your
filesystem to the module list in your existing .cabal file. This way
you'd explicitly track the module manifest in your .cabal file and you'd
make changes to the manifest more explicit than merely by what
filenames happen to be in a folder, and you reduce the busy-work of
manually having to sync that list by hand if this is something that
causes you overhead. This would provide us best of both worlds IMO. A
proof-of-concept for such a tool would easily be hackable in a
single weekend; it can be easily prototyped outside of cabal proper and
if deemed convenient enough could be integrated into cabal proper.

hvr on 22 Aug 2020

ought to be fully intentional

It's just annoying to do. I use hpack to avoid having to do this. If cabal were to provide an (opt in) feature to do this, I'd drop hpack like a rock.

autodetection-via-filesystem runs the risk of including unintended "garbage"

I'm aware of this, but nearly every langauge does it like this. Opt in functionality for this would be great.

However, there's no reason we can't have tooling which is able to sync your filesystem to the module list in your existing .cabal file.

How about instead we add a hidden-modules field that enables auto detection and doesn't work with exposed-modules. That way I don't have to run any more commands and cabal just works.
That would be the opt-in part. I can still hide unintended packages by adding them to hidden-modules. The risk is that I accidently upload a package to hackage with too many modules exposed, but that's a risk well worth it compared to the continued annoyence of having to update that module list, or having to run hpack.
It only takes a version upgrade to fix it, no big deal.

On a side note, I'm frustrated enough by this to implement this myself. If I'm reasonably sure this would be accepted as a change.

jappeace on 22 Aug 2020

If I read the docs correctly, cabal init already does something similar. Could there be another command to update the information?

garethrowlands on 22 Aug 2020

👍1

Just mentioning http://oleg.fi/gists/posts/2019-08-11-cabal-fmt.html#extra-expand-exposed-modules-and-other-modules which is a nice tool that can be used for it, too.

fendor on 22 Aug 2020

👍1

If you want this behavior but don't want to use hpack, you could try using autopack.

tfausak on 22 Aug 2020

If you want this behavior but don't want to use hpack, you could try using autopack.

Hmm. it will rather be better this be a feature native to the build tool, instead of relying on another 3rd party tool...

finlaydotb on 22 Aug 2020

The short gist is, that the exposed list of modules is a critical piece of information of a package and ought to be fully intentional, so autodetection-via-filesystem runs the risk of including unintended "garbage" that happened to lay around in a filesystem

This won't be including files from any random directory though. The directory that contains exposed module would have to be explicitly specified before modules get picked from there. I think having to do that ticks the _intentionality_ box.

finlaydotb on 22 Aug 2020

Just mentioning http://oleg.fi/gists/posts/2019-08-11-cabal-fmt.html#extra-expand-exposed-modules-and-other-modules which is a nice tool that can be used for it, too.

Thanks! I totally forgot about @phadej's tool already having implemented it!

I'll just quote it here again in the hopes Github's indexing makes this ticket more discoverable (turns out cabal-fmt was already mentioned in the related issue https://github.com/haskell/cabal/issues/5343#issuecomment-520140470):

expand `exposed-modules` and `other-modules`

The recent addition is an ability to (re)write field contents, while formatting. There's an old, ongoing discussion of allowing wildcard specification of exposed-modules in .cabal format. I'm against that change. Instead, rather cabal-fmt (or an imaginary IDE), would regenerate parts of .cabal file given some commands.

cabal-fmt: expand <directory> is a one (the only at the moment) such command.

cabal-fmt will look into directory for files, turn filenames into module names and append to the contents of exposed-modules. As the field is then nubbed and sorted, expanding is idempotent. For example cabal-fmt itself has:

-- cabal-fmt: expand src
--
exposed-modules:
  CabalFmt
  ...

The functionality is simple. There is no removal of other-modules or main-is. I think that using different directory for these is good enough workaround, and may make things clearer: directory for public modules and a directory for private ones.

I can sympathise with the desire to throw in everything and the kitchen sink into cabal proper; but unfortunately every single feature added to a big project is one that increases our maintenance surface. It's not necessarily a perfect comparison but GHC is plagued by a similar issue, see https://osa1.net/posts/2020-01-22-no-small-syntax-extensions.html which tells the cautionary tale of the cost associated with trivial syntax extensions such as "BlockArguments" which only time will tell if people will actually use it to justify its inclusion (and as the blogpost points out, it's unlikely to be used in professional environments due to the cognitive overhead involved)... but I digress :-)

That being said, I'm not saying that cabal-fmt feature might not end up in cabal proper, especially if it turns out to be low-risk and there's an obvious canonical logic to its behaviour without many knobs and buttons to consider (which would risk feature creeping). So by all means, please try out cabal-fmt and tell us if there's things you'd like to tweak/modify/improve.

hvr on 22 Aug 2020

The previous issue for this is #5343.

I think that the crux of this issue is that automatically discovering exposed-modules is nice for humans but not nice for machines. Any solution is going to have to grapple with that.

hpack works great for local development because it finds exposed modules for you. But it also works great for publishing to Hackage because it produces a *.cabal file with all the exposed modules explicitly listed out.

autopack only really works for local development. If you uploaded a package that used autopack to Hackage, it would appear to expose no modules at all.

cabal-fmt is great for publishing to Hackage since it produces a typical *.cabal file. It's slightly suboptimal for local development because it's not integrated with any build tools. But integrating it yourself isn't terribly hard.

tfausak on 22 Aug 2020

is nice for humans but not nice for machines.

I'd argue that you can change machines, humans will keep on complaining about this untill someone provides a good first class solution.

But it also works great for publishing to Hackage because it produces a

So under machines that expect this functionality, it's currently the cabal project, and hackage, are there any other projects that expect this? I'm trying to discover how hard it would be to make something like this as a first class feature, rather then providing the Nth workaround.

jappeace on 22 Aug 2020

❤1

I can sympathise with the desire to throw in everything and the kitchen sink into cabal proper; but unfortunately every single feature added to a big project is one that increases our maintenance surface.

I don't think this ticket is a very strange feature to have for a build tool. Practically all other langauge build tools do this. Why is cabal so special?

jappeace on 22 Aug 2020

❤1

I think that the crux of this issue is that automatically discovering exposed-modules is nice for humans but not nice for machines. Any solution is going to have to grapple with that.

What if the module discovery wasn't automatic but had to explicitly be invoked? E.g. if I ran cabal generate-modules and the tool updated the exposed-modules: list in the cabal file? I could then even run git diff project.cabal to verify that the module list generation worked exactly as intended.

In general, I don't think flipping the way that exposed-modules: works to instead be hidden-modules: is necessary to allay this pain. I think it's fine if exposed-modules: still has to list everything, it would just be nice if cabal-the-utility helped manage that list for you.

Edit: I think this is essentially what @hvr is describing in the last paragraph of his first response.

charukiewicz on 22 Aug 2020

What if the module discovery wasn't automatic but had to explicitly be invoked?

It's an improvement, but I just want to make it so I almost never have to see that warning -Whome-missing-modules. I also hope to make the cabal file just smaller. Listing all modules is tedious and doesn't give a good overview.

In general, I don't think flipping the way that exposed-modules: works to instead be hidden-modules: is necessary to allay this pain. I think it's fine if exposed-modules: still has to list everything, it would just be nice if cabal-the-utility helped manage that list for you.

Maybe forget about this: My idea with that was that in most cases you just want to expose everything, so the cabal file becomes a lot shorter and more managable. But that other issue does this as well, just through a better approach (globbing). I think it would be appreciated as a PR (from what I understand) so I'll work on that in the near future. But please let me know if there are objections, I don't want to waste my time.

jappeace on 22 Aug 2020

But that other issue does this as well, just through a better approach (globbing). I think it would be appreciated as a PR (from what I understand) so I'll work on that in the near future. But please let me know if there are objections, I don't want to waste my time.

I think the globbing approach suggested in #5343 would be a good one. Would probably be even better than running a separate cabal command to populate the exposed-modules list.

charukiewicz on 22 Aug 2020

Just to echo, this is the key issue preventing me from switching to cabal files. My current project has ~400 modules. Haskell already has a lot friction compared to other languages such as manually managing imports and I don't want to add more.

Why not use cabal-fmt? The issue is I need to build tooling wrapping cabal/stack to call this with every operation to make it seamless. This adds enough pain to make it not worth it IMO.

I think there are valid arguments as to why this feature may not want to be integrated into cabal the tool. I won't claim to understand the cabal side of things. However, adding it could unlock a lot of new cabal users which I think is quite valuable for cabal (more users = more issues + PRs coming in hopefully). As such I would weight adding it quite highly.

AlistairB on 23 Aug 2020

Duplicate of https://github.com/haskell/cabal/issues/5343

phadej on 24 Aug 2020

👍1

It was brought to my attention on reddit that even though this was closed, at least @phadej thinks the globbing solution is worth putting into Cabal if someone implements it.

My takeaway from this being closed was there wasn't interest in fixing this issue, so I post this so others don't make my same mistake and see there is a way forward 🙂

codygman on 30 Aug 2020

@codygman you understood me wrong. I expect the patch being quite big and invasive. Yet if someone actually makes it and still thinks it's a good idea, I'd be curious to know.

I do not recommend to pursuing that path to the end, the patch will probably be still rejected.

Case-in-point, just now I used @hvr's tool to find out which package exports Data.RFC5051. It saved me a lot of time while I fix Hackage metadata (going through pandoc dependencies by hand is not a human job).

There are nice things which would be nice to have, but we cannot.

Wouldn't it be nice if cabal figured out dependencies without explicit build-depends. In theory possible, in practice not. This is issue is the same. Reddit polls won't change my mind.

phadej on 30 Aug 2020

I said or elsewhere, but I want to preface this comment with:

You guys do a huge amount of work as volunteers and I appreciate it. I appreciate it so much I want others to be able to come back to Cabal from stack so even more can appreciate that work. I'd prefer not to have to say "don't use the default build tool cabal, you have an easier time with stack" when introducing new people to Haskell.

Technically I suspect cabal is better in many ways, but UX wise in a lot of ways it's just not and I can provide many examples if there is interest.

Reddit polls won't change my mind.

They should carry some weight if you care about users opinions. If the problem is the poll is on reddit, what sort of poll would convince your? If you simply don't care about users opinions with regards to UX issues, that's pretty alarming. Please confirm that isn't the case?

There are nice things which would be nice to have, but we cannot.

I mean, we can have it. I and others would just like to not resort to "switch to stack" or "use some external tool".

In theory possible, in practice not.

From a user perspective, that's hard to buy since stack build ends up doing it just fine. There are differences there and technical arguments below the surface, but at the end of the day people will judge cabal against stack.

codygman on 31 Aug 2020

👍1

If you simply don't care about users opinions with regards to UX issues, that's pretty alarming

Not true. I do listen. And we do implement features to improve users experiences. Judging by response to a single request is just not fair.

phadej on 31 Aug 2020

👍5

Just as a datapoint in a discussion: even Michael Snoyman stated recently, that supporting automatic generation of cabal files from package.yaml in stack projects (using hpack) led to all sorts of build reproducibility issues. See https://www.fpcomplete.com/blog/storing-generated-cabal-files/

jhrcek on 31 Aug 2020

😕1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Installed version of base not considered

nomeata · 4Comments

cabal-version check doesn't happen early enough

ezyang · 4Comments

cabal sandbox add-source directory-name not working

p75213 · 4Comments

Solver needs around 20000 backjumps to solve for servant-mock-0.8.3

phadej · 3Comments

cabal new-repl outside the project cwd is /tmp/....

phadej · 4Comments

Cabal: Support automatic population of `exposed-modules:`

Most helpful comment

All 21 comments

expand exposed-modules and other-modules

Related issues

expand `exposed-modules` and `other-modules`