Meson: Division of labor between `coredata` and `environment` is no longer clear

Created on 8 Aug 2020 · 16Comments · Source: mesonbuild/meson

Originally coredata was supposed to be for data that was persisted, and environment was supposed to be for data that wasn't. But at this point I think much of the intent has rotted away and we basically have to big, entangled balls of arbitrary functionality.

Let's take some time to re-envision how we want things to work, and how we might divide (or not divide!) the division of labor.

design discussion

Source

Ericson2314

🚀1

Most helpful comment

I am not working on meson long enough to know how things were originally meant to be divided (and without having touched most of the code in both of them), so I can only talk about what I would expect Environment and CoreData to be.

First of all, I have found it always strange that the compiler detection logic is in Environment. For me, it would make much more sense to move this to a separate class in compilers. Also, why are functions like get_clang_compiler_defines in environment and not in the relevant compilers.mixins?

Then, why is there option handling in both Environment and CoreData? I would suggest splitting this into a new AllMesonOptions (yes, I am not good with names) class (that is also serialized like CoreData).

So, in pseudocode, this would result in:

class Environment:
    def __init__(self):
        self.coredata  = CoreData()         # pretty much the same, but without the meson options
        self.options   = AllMesonOptions()  # handles all meson options (also from native and cros files)
        self.compilers = Compilers()        # compiler detection (and storage) from Environment and CoreData

        # handle cross/native files + some general stuff already in Environment

mensinda on 8 Aug 2020

👍2

All 16 comments

As far as options are concerned, my vision is that Environment deals with "raw" options, unparsed strings from the user, env, and default_options. A big dict with keys in the form [<subproject>:][build.][lang_]name just like user would type them in command line. Coredata then parse those raw options into its persistent storage using UserOption classes.

xclaesse on 8 Aug 2020

So, in pseudocode, this would result in:

class Environment:
    def __init__(self):
        self.coredata  = CoreData()         # pretty much the same, but without the meson options
        self.options   = AllMesonOptions()  # handles all meson options (also from native and cros files)
        self.compilers = Compilers()        # compiler detection (and storage) from Environment and CoreData

        # handle cross/native files + some general stuff already in Environment

mensinda on 8 Aug 2020

👍2

I agree all compiler detection should be moved to a new class in compilers/ folder. Some stuff are utilities that could be moved to mesonlib, and the remaining option handling (as I described above) could be its own class too. Then there is nothing left in Environment :D

Note that's further refactoring that could be done, but that's orthogonal to the changes I'm doing in https://github.com/mesonbuild/meson/pull/7521. Can't do everything at once.

xclaesse on 8 Aug 2020

Then there is nothing left in Environment :D

Which would certainly be better than having one class that does (half) of everything :)

Also, a refactor should also come with type checking.

mensinda on 8 Aug 2020

OK this is a good start. So what exactly is supposed to be persisted vs computed every time? This relates to how/whether we store the "raw" configuration: can we not worry about whether they came from config files, env vars, or command line flags, or must we?

Ericson2314 on 8 Aug 2020

Here's a list of things that are supposed to be computed only once (without a wipe):

machine files
environment variables
compilers/linkers
host/build machine information

And the things that need to be recomputed on reconfigure

dependencies (if --reconfigure is given)
cli options (if cli options changed)

The thing is, all of that stuff needs to be serialized. I'm not really sure what Environment should be providing. I thinkj @mensinda's proposal is a good one, basically turning coredata into a composed class instead of two mega classes. Along with this, I think it might be good to have some options that the compilers generate at the class level instead of the instance level, like c_args and c_link_args since we need those before the class is actually instantiated.

dcbaker on 9 Aug 2020

👍1

So --wipe, and certain ways of the changing build.meson are how this stuff is invalidated? If the compilers need to be recomputed because of e.g. a new add_language or change to project(...), are the cached machine files and environment variables still used?

Ericson2314 on 9 Aug 2020

Also, cached dependencies, etc. can be cleared with meson configure --clearcache

mensinda on 9 Aug 2020

I've been doing a lot more thinking about this, especially about options. It's really not ideal to need to keep a "raw" version of options as well as the meson Option type version. Having to keep them around is ugly, and produces a shadow copy and separate sources of truth. It also means that the values wont be validated until the actual Option class is set, so the sooner we do that the better. So I started thinking about whether we really need to store any of the "raw" options, and if we do which ones and for how long. Along those lines I've come up with the following ideas:

builtin options except compiler and backend options can be initialized immediately, and we don't need the shadow copy in Environment at all. backend is pretty nice too, since it's really OS dependent I don't think it's a problem to block setting it in default_options either. That means it can only be set via cli or machine file. So with a two pass initialization of builtins (one for non-backend options, and then a second just for backend options), we don't need to store these at all.

user options should not be able to be set via the project() default options, which means they can be initailized as soon as the meson_options.txt file is read. Because these come from a separate part of the machine files and can only come from a machine file, cli or the options file, they can be initialized and avoid a shadow copy at all too. Just parse the appropriate part of the command line into them.

For compilers I don't see how to get rid of a shadow copy, since the compiler classes are able to create their own options with their own constraints (and two C compilers might support c_std but with different accepted values), but I think we can make things much cleaner by making two changes:
1) each compiler stores a reference to it's own options, so it doesn't need them passed to it everywhere
2) the compiler initializes all of the compiler options when it initializes (compiler initialization is compiler option initialization)

This plays nicely with the nested compiler_options model, because we do something like:

CCompiler(options=coredata.compiler_options[for_machine]['c'], raw_options=env.compiler_options[for_machine]['c'])

the Coredata (or whatever new class) still holds the values, but the compiler populates it and has direct access to the fields itself. This probably will allow us to stop passing Environment into the compiler in so many places, and probably let us get rid of the MachineChoice attribute in the Compiler as well.

To make this work effectively we probably need a few changes. We probably need to separate the linker and compiler representations more, I think it may be enough to replace:

L = Linker(...)
C = Compiler(... linker=L)

with

C = Compiler(...)
C.linker = Linker(...)

I think it also makes sense to push linker specific options out of the compiler class into the linker class, but that's likely a follow up change.

dcbaker on 18 Aug 2020

builtin options except compiler and backend options can be initialized _immediately_, and we don't need the shadow copy in Environment at all. backend is pretty nice too, since it's really OS dependent I don't think it's a problem to block setting it in default_options either.

On Windows you can choose between ninja and vs backends, and that can be set in default_options. Also in the future we could have other backends on Linux too.

user options should not be able to be set via the project() default options

We currently can, and that need to continue to work regardless whether or not we deprecate that. default_options can even contain values from sub-project options, so we need to keep them until we eventually configure that subproject.

1. each compiler stores a reference to it's own options, so it doesn't need them passed to it everywhere

You still have to pass per target overrides. Also compilers objects are global but options are soon to be per subproject.

This plays nicely with the nested compiler_options model, because we do something like:
CCompiler(options=coredata.compiler_options[for_machine]['c'], raw_options=env.compiler_options[for_machine]['c'])

Nested or not that does not really change anything, you can pass all options and CCompiler will only lookup for c_* in the dict. But that's true for the for_machine layer I think. I think we need to keep that layer.

Overall, I think this is a lot of deep changes, which will require weeks of work to implement and review, with high risk of regressions (just see how many you've got with machine file PR). I'm not convinced it's worth it, tbh. It feels like changing everything and end up not that far from were we already stand. Especially since that whole refactoring is driven by no actual needs, we are not stuck on implementing any feature request. There is nothing we can say "ok, implementing this is complicated with current structure, we need to rethink it". Actually it's the opposite, implementing machine file option was almost trivial without refactoring, implementing per-subproject compiler option was done a year ago and had to revert recent refactoring to make it work again.

xclaesse on 18 Aug 2020

Given the number of PRs I queued for 0.56 and nobody has time to give any review, I don't think it's reasonable to spend that much time on this currently. But I would love to get reviews on https://github.com/mesonbuild/meson/pulls?q=is%3Aopen+is%3Apr+milestone%3A0.56.0 :D

xclaesse on 18 Aug 2020

On Windows you can choose between ninja and vs backends, and that can be set in default_options. Also in the future we could have other backends on Linux too.

Setting this to anything except ninja would limit the project to one OS, I'm very unconvinced allowing this is a good idea, but more importantly I doubt anyone is actually doing it.

I'm trying to find a solution we can agree on here @xclaesse, I really dislike your solution of one string building for keys. for reason's I've tried to make clear. @Ericson2314 seems to share my concerns. I'm not really sure where to go because I feel like we're at an empasse, I don't like your solution and you don't like mine, I was hoping to come up with a compromise.

dcbaker on 19 Aug 2020

On Windows you can choose between ninja and vs backends, and that can be set in default_options. Also in the future we could have other backends on Linux too.

Setting this to anything except ninja would limit the project to one OS, I'm very unconvinced allowing this is a good idea, but more importantly I doubt anyone is actually doing it.

Some projects are windows only, I don't see a problem with that, pretty sure that exists out there. I also had in mind Samurai, pretty sure sooner or later they will add features ninja does not have, Ninja is not that well maintained.

If there is one thing I learned while working on Meson is people does all kind of weird stuff with their build system and they don't like when they get regressions. See for example having default_options: [subproject:foo=bar] or c_std=gnu99 for MSVC, none of us knew that feature even existed, but we still had a bug report within a week when it regressed in master, not even released... That being said, I'm not against deprecating doing so.

The thing is, it's hard to sell a refactoring that will take a lot of energy from the whole team already massively understaffed when you don't bring any new possibilities and start by breaking existing features. It's even harder to sell when the first step (https://github.com/mesonbuild/meson/pull/5489) caused a regression I had to fix myself (https://github.com/mesonbuild/meson/pull/7348) and broke some compilers that are still not fixed (Xc16CCompiler, C2000CCompiler, C2000CPPCompiler, CudaCompiler and EmscriptenMixin). The 2nd step (https://github.com/mesonbuild/meson/pull/6597) caused 2 regressions already reported, and a couple suspect warnings that I reported. The 3rd step (https://github.com/mesonbuild/meson/pull/7529) is completely nonsense.

I'm trying to find a solution we can agree on here @xclaesse, I really dislike your solution of one string building for keys. for reason's I've tried to make clear. @Ericson2314 seems to share my concerns. I'm not really sure where to go because I feel like we're at an empasse, I don't like your solution and you don't like mine, I was hoping to come up with a compromise.

I really don't understand what you've got against my raw_options proposal. It's simple and safe, fix all the regressions we got saw far in master. There is nothing fundamentally better in having nested dict VS structured string keys, both are valid approach, it's just a matter of using what fits best your specific use cases.

xclaesse on 19 Aug 2020

Selling refactors is always hard. The thing about refactors is the longer you put them off to get feature work done or whatever the more you need them and the harder they become. I'm arguing that the end of the refactor is less bugs and easier to maintain, that's a feature. That's a huge feature.

As for structured key strings... Sometimes you can't avoid them, like on our command line where anything we do is going to be a structured string or going to be horrific. But internally... parse those structured strings into real structures ASAP. I've never seen a case where structured strings were better than a nested data structure. In fact, I tried to keep the structured string approach when I started the machine file work, but it was just endless bugs. Converting to the nested dict approach made that all go away almost immediately.

I think it's pretty unfair that you're hanging regressions for things that we didn't a) document or b) have tests for. Of course people are using things we don't claim to expose, but if we don't document it or have tests, they're going to regress. That's not to say we should fix it, but this is why we have a release cycle and blocking issues and all of that.

dcbaker on 19 Aug 2020

In fact, I tried to keep the structured string approach when I started the machine file work, but it was just endless bugs.

That's surprising, I did it in like 30min of work 2 months ago: https://github.com/mesonbuild/meson/pull/7293.

xclaesse on 19 Aug 2020

btw, my point about regressions is not that you did bad work, not at all. Just that refactoring is inherently error prone and must have good reasons worth the risk. Especially since no matter what we do, option handling is always going to be a complex topic.

xclaesse on 19 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings