Bazel: Get legacy license logic out of Bazel and replace with a more general framework

Created on 15 Feb 2019  Â·  26Comments  Â·  Source: bazelbuild/bazel

Bazel has legacy support for license-checking third party dependencies that has a) never properly worked and b) messed up tasks that have nothing to do with licensing.

Examples:

Plans are underway for a replacement (https://github.com/bazelbuild/bazel/issues/7194#issuecomment-460384466, https://github.com/bazelbuild/bazel/issues/188#issuecomment-444271909) which will not need to be directly built into Bazel.

Whatever timeline that replacement happens at, the legacy logic represents a broken API and should be removed for Bazel 1.0.

P2 team-Configurability feature request

Most helpful comment

Gathering users who have commented on licensing in various issues. My appologies if I have left someone out.
@schroederc @michaelsafyan @rahul-malik @vmax @ittaiz @petemounce @werkt

All 26 comments

An update on current plans

I am in the process of rebuilding license checking for Google. The implementation design is not ready to review, but the requirements are getting close to firm. You can find a copy of those here:
PRD: LIcense compliance checking in Bazel

The highlights are:

  • This is a _framework_ for providing license compatibility checks rather than a specific set of rules. No one organization's size fits all.
  • Completely Starlark rule based and not built into the core Bazel code. Ideally, this will be under bazelbuild/rules_license.
  • Support for gathering licenses used in an application so they may be easily published as part of that application.

The first point is the key one. Google has it's own view of what we can put in particular kinds of applications. Other organizations will rightly have other views. The implementation will allow someone to easily create their own check_licenses rule implementation to reflect their needs.

At this time, our plan is:

  • accept comments from the Bazel community about features of the framework.
  • turn existing license rules within Bazel into no-ops. The syntax will be allowed, so as not to break existing uses, but they will essentially be comments.
  • begin implementation in Q2 2019
  • deliver enough for Bazel users to begin adopting during calendar 2019

Look to this issue for updates on progress.
I welcome comments in this issue or in mail to [email protected], please include [email protected] in the thread.

Gathering users who have commented on licensing in various issues. My appologies if I have left someone out.
@schroederc @michaelsafyan @rahul-malik @vmax @ittaiz @petemounce @werkt

@shenson @DoomGerbil @ashmere

Would it be possible to make the document world-commentable? I think it would be easier to provide feedback in-context on the document than out-of-band in one of those other forums.

I would rather keep comments in this thread. That keeps it public. It also follows the model we are trying with design documents checked into github. Since the markdown article does not have marginal comments, we've been trying to do reviews in the PR review thread.

OK. Here is my feedback on the document, then:

FR: Specify the license "type". The license "type" is a string which has a meaning to an organization's compliance department. It could be as simple as none|notice|restricted or as complex as a labeling of dozens of different types.

It seems like allowing the meaning of "type" to vary from one org to another could be potentially problematic, especially if code from one org depends on / includes content from another org (imagine, for example, that org A acquires org B, both org A and org B have used such a feature, and the types used by org A and the types used by org B do not align).

If there are derived/computed properties about licenses, there should probably be a way to scope this to a particular data owner for that derived property so as to prevent collisions.

For something as general as an inferred "type" (or other unscoped attribute), I would recommend a single, universally agreed upon definition as to the meaning and interpretation.

FR: All code in //third_party must be under a license. Any implementation must provide the same automatic enforcement (or better) that is done today. To put this in a more abstract way the implementation should allow us to create policies for any arbitrary source tree path (e.g. All files under //asop/… must have X)

In opensource Bazel, it seems that most third party code that is likely to have this requirement is pulled in via WORKSPACE rules (e.g. by http_archive or git_repository rules). In addition to enforcement by directories, I think the automated enforcement should enable enforcement by rule type (e.g. automatically apply such enforcement to all http_archive, git_repository, pip_import, etc. rules) or to automatically apply such enforcement to any WORKSPACE-level dependencies.

Related to this, for rules such as git_repository, pip_import, etc., there should be a way to automatically infer the relevant license. For example, consider allowing git_repository to automatically infer its license from a file named LICENSE that exists at the root of the repository.

On Mon, Mar 4, 2019 at 4:29 PM Michael Safyan notifications@github.com
wrote:

OK. Here is my feedback on the document, then:

FR: Specify the license "type". The license "type" is a string which has a
meaning to an organization's compliance department. It could be as simple
as none|notice|restricted or as complex as a labeling of dozens of
different types.

It seems like allowing the meaning of "type" to vary from one org to
another could be potentially problematic, especially if code from one org
depends on / includes content from another org (imagine, for example, that
org A acquires org B, both org A and org B have used such a feature, and
the types used by org A and the types used by org B do not align).

Yes. When I get to design, this probably will end up as not a single type
but a set of meaningful tags. E.g. 'requires_notice',
'requires_relinkability' (think LGPL), 'requires_source_mods_published'
(again LGPL), 'requires_app_source_published' (GPL), ...

If there are derived/computed properties about licenses, there should
probably be a way to scope this to a particular data owner for that derived
property so as to prevent collisions.For something as general as an
inferred "type" (or other unscoped attribute), I would recommend a single,
universally agreed upon definition as to the meaning and interpretation.

I'm not sure where you are going with 'computed properties'. Our plan is
simply to pass information from license() rules unfiltered up to top level
binary rules, where they can be analyzed as a whole. If someone wants to
make a checker that treats code in different paths differently, that is up
to them.

FR: All code in //third_party must be under a license. Any implementation

must provide the same automatic enforcement (or better) that is done today.
To put this in a more abstract way the implementation should allow us to
create policies for any arbitrary source tree path (e.g. All files under
//asop/… must have X)

In opensource Bazel, it seems that most third party code that is likely to
have this requirement is pulled in via WORKSPACE rules (e.g. by
http_archive or git_repository rules). In addition to enforcement by
directories, I think the automated enforcement should enable enforcement by
rule type (e.g. automatically apply such enforcement to all http_archive,
git_repository, pip_import, etc. rules) or to automatically apply such
enforcement to any WORKSPACE-level dependencies.

Whoops. That is a Google internal requirement. I should have removed it
from the copy. There is no reason for our legal team to enforce their
policy on Bazel users. That said, I can probably build the tools needed to
enforce our requirements in a way that I can share them. However, we do it
at the source code control layer. You can't check it in without the license.

If you wanted to figure out a capability to check enforcement by rule type
and path at workspace import time, that would be a welcome addition - but I
won't be devoting cycles to it.

Related to this, for rules such as git_repository, pip_import, etc.,
there should be a way to automatically infer the relevant license. For
example, consider allowing git_repository to automatically infer its
license from a file named LICENSE that exists at the root of the
repository.

License detection and classification is explicitly out of scope for what we
will be building. I believe the Android Open Source Project has been
working on that for a long time and still hasn't nailed it. We don't have
the resources to try to outdo them on that problem.

To clarify my feedback, here are the use cases that I envision:

  • Audience: legal team
    Action: Define metadata about common sets of licenses

  • Audience: legal team
    Action: Define policies about which kinds of licenses may be included in a given
    kind of artifact, based on metadata about the license

    • Audience: legal team
      Action: Configure commit hooks that prevent third-party code from being submitted that does not contain a license (with third party being defined either according to path and/or the fact that the code is pulled in via a WORKSPACE)
  • Audience: engineering team
    Action: Declare the kind of artifact being produced that implicitly binds that artifact
    to a particular policy set up by the legal team

    • Audience: engineering team
      Action: Import dependencies from git repositories and other opensource projects
      with standard opensource licenses without needing to figure out how to define
      the metadata associated with those licenses or perform other complex license-related tasks

Here's the thing, without a way to automatically map license files to stable names for those licenses and, from those stable names, to clear properties about those licenses, you end up putting a significant burden on the engineering teams who just want to add a new entry to the WORKSPACE or pull in some additional dependency. It's in that spirit that I mentioned "computed properties".

Like, for example, it should be on the legal team to define the following (syntax TBD):

  • Here are the set of known licenses
  • Here are the properties that we know about these licenses
  • Here is how you would determine if a given LICENSE file is this particular known license
  • Here are the properties that define which licenses are allowed for this category of binary (server-side, client-side, etc.)
  • Here is the default license that should be implicitly assumed for most directories
  • Here are the set of directories where there must be a LICENSE file that matches one of the known licenses
  • Here are the set of WORKSPACE rules that must contain a LICENSE that matches one of the known licenses, and here are the set of WORKSPACE rules that are exempt

And for the engineering team, it should be as simple as declaring a normal git_repository dependency, possibly with an additional license_path attribute if the LICENSE file is stored in an unusual path/name relative to the root of the repository, as well as declaring license policy tests/checks on a given library or binary to validate that dependencies conform to a given policy.

If there is no way to deduplicate licenses and infer/derive metadata, then that work ends up getting shifted to engineers, and that really hinders the usage of opensource.

I don't think you need to nail the classification/deduping; something like "exact match after extraneous whitespace before/after is stripped" would be sufficient so long as it is pluggable. That should be sufficient to match common, well-known opensource licenses that have not been modified. (It's reasonable for modified ones to not automatically match).

On Wed, Mar 6, 2019 at 5:12 PM Michael Safyan notifications@github.com
wrote:

To clarify my feedback, here are the use cases that I envision:

-

Audience: legal team
Action: Define metadata about common sets of licenses

Yes. Sort of. I would like to have some metadata bits standard for Bazel.
Things like names (apache, gpl, lgpl, ...) and some very broad classifiers
like 'unrestricted', 'only_requires_notice'. Those broad classifiers should
derive from a legal reading of the license, but it will make life harder
than need be if every Bazel user has to reinvent their own terms for well
known concepts.

>

-

Audience: legal team
Action: Define policies about which kinds of licenses may be included
in a given
kind of artifact, based on metadata about the license

Absolutely. Each org using Bazel must be able to redefine their policies.
We'll ship an example tool which will demonstrate possible policies, but
organizations that care are expected to extend it or write their own.

>

  • Audience: legal team
    -

    Action: Configure commit hooks that prevent third-party code from
    being submitted that does not contain a license (with third party being
    defined either according to path and/or the fact that the code is pulled in
    via a WORKSPACE)

Right. Everyone cares about this, but it should be done by the org in
conjunction with their source code control system or through other audit
mechanism. We welcome contributions for tools which could be used to
enforce this for popular version control systems.

-
-

Audience: engineering team
Action: Declare the kind of artifact being produced that implicitly
binds that artifact
to a particular policy set up by the legal team

I am not exactly sure what you mean here, but I can talk about what I am
thinking. I think there are two parts, because maybe we mean different
things by policy.

  1. artifacts are always bound to a license instance (with some defaulting
    so it becomes manageable). The license holds the metadata. For the most
    part, this should be trivial.
  2. top level artifacts like executables and .zip files combine multiple
    lower level artifacts. That is where we apply the policy defined by your
    compliance team.

-

Audience: engineering team
Action: Import dependencies from git repositories and other opensource
projects
with standard opensource licenses without needing to figure out how to
define
the metadata associated with those licenses or perform other complex
license-related tasks

Here's the thing, without a way to automatically map license files to
stable names for those licenses and, from those stable names, to clear
properties about those licenses, you end up putting a significant burden on
the engineering teams who just want to add a new entry to the WORKSPACE
or pull in some additional dependency. It's in that spirit that I mentioned
"computed properties".

Like, for example, it should be on the legal team to define the following
(syntax TBD):

  • Here are the set of known licenses
  • Here are the properties that we know about these licenses

Yup. That is what I would like to see in rules_license

>

  • Here is how you would determine if a given LICENSE file is this
    particular known license

I am explicitly not working on that problem. Nor is any Bazel team member.
This is a large and open ended problem, fraught with legal implications if
you do it wrong.

>

  • Here are the properties that define which licenses are allowed for
    this category of binary (server-side, client-side, etc.)

That is a per-company policy which should be enforced by the checking
logic based on license metadata. You might have different rules to apply
for server-side code depending on the legal jurisdiction applicable to the
country the data center the code is running in.

>

  • Here is the default license that should be implicitly assumed for
    most directories

Yes. That should be easy to do for the license compliance tool, because it
will have the LicenseInfo providers of all the individual artifacts
included. When there is a dep without a license the tool can default it.

>

  • Here are the set of directories where there must be a LICENSE file
    that matches one of the known licenses

Part of source code control

>

  • Here are the set of WORKSPACE rules that must contain a LICENSE that
    matches one of the known licenses, and here are the set of WORKSPACE rules
    that are exempt

And for the engineering team, it should be as simple as declaring a normal
git_repository dependency, possibly with an additional license_path
attribute if the LICENSE file is stored in an unusual path/name relative
to the root of the repository, as well as declaring license policy
tests/checks on a given library or binary to validate that dependencies
conform to a given policy.

adding a license pointer to git_repository is reasonable. But it is an act
of trust. You'll be saying that the code I am about to import is under a
specific known license. Like I said, we are not working on imputing the
license metadata by looking at the text of the license file itself.

If there is no way to deduplicate licenses and infer/derive metadata, then
that work ends up getting shifted to engineers, and that really hinders the
usage of opensource.

I don't think you need to nail the classification/deduping; something like
"exact match after extraneous whitespace before/after is stripped" would be
sufficient so long as it is pluggable. That should be sufficient to match
common, well-known opensource licenses that have not been modified. (It's
reasonable for modified ones to not automatically match).

You are welcome to try to build that and add to the framework we come up
with, but I am not going to be working on that.

—

You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/7444#issuecomment-470298270,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AC5znDOlr6cSJj-ZbuLv3xKd_YAFioUDks5vUD1igaJpZM4a-X8l
.

@aiuto - I probably made a mistake combining this issue into both removal of the old stuff and addition of the new stuff. The former's basically done and I've taken myself off as an assignee. Should we also drop the bazel 1.0 tag now?

I don't think it is a mistake. This is a tracking bug for the entire issue.
The bazel 1.0 tag? Good question. The removal happened, so we could leave it as an indication that this is a 1.0 change. OTOH, the replacement is not a 1.0 blocker, so is we think of the tag as a milestone, then it doesn't make sense.

I honestly can't decide if it makes sense to remove or leave it. Our labels are an unruly mess.

@aiuto Very sorry to bother you, just trying to understand what the current stance on Bazel and licenses is. I see that the legacy logic for this has been / is being removed.
Is there a timeline for adding your proposed new changes?

I expect pieces to start coming out in Q4 through Q2.

On Tue, Oct 8, 2019 at 6:04 AM Florian notifications@github.com wrote:

@aiuto https://github.com/aiuto Very sorry to bother you, just trying
to understand what the current stance on Bazel and licenses is. I see that
the legacy logic for this has been / is being removed.
Is there a timeline for adding your proposed new changes?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/7444?email_source=notifications&email_token=AAXHHHCUANK4ETNBS7JT3ZDQNRLMRA5CNFSM4GXZP4S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEATUMNA#issuecomment-539444788,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAXHHHBCAQQNHA2XS45QKZTQNRLMRANCNFSM4GXZP4SQ
.

@aiuto very cool thanks! Awaiting it eagerly.

Related: #7281 - makes licenses attribute not visible from Starlark.

@FWirtz @mzeren-vmw
I have a proposal for the planned work:
https://docs.google.com/document/d/1uwBuhAoBNrw8tmFs-NxlssI6VRolidGYdYqagLqHWt8

The status is:

  • I've done enough prototyping to believe this plan is both sufficient and implementable.
  • It is in internal review at Google
  • Unless we find any truly unforeseen blockers, we are starting work shortly.

The doc is sort of dense but the hightlights are.

  • uniform way to say that an _individual target_ is available under well known license names (e.g. MIT, LGPL_V2)
  • a _central place per workspace_ to hang attributes of each particular kind of license (e.g. notice-only, must-publish-modifications).
  • a capability for users to build whatever compliance mechanisms they need on top of the first two.
  • a new repository (rules_license) to hold the generic license names and software tools
  • a new global attribute in Bazel to replace _licenses_

Following an X Window System principle, this is about mechanism, not policy. We will build some example compliance check tools that have (what I believe are) reasonable defaults, but a key point is that I expect most organizations to create their own checks which are specific to their needs.

Very cool @aiuto, thanks for sharing the design doc. Just read all of it, really interesting.

Really interested in both example use-cases you're explaining as well. Generating the copyright information for shipping and ensuring only a specific subset of licenses is used is exactly what I'm interested in.

The rules (and aspects behind license gathering) will be able to bubble up the set of package copyrights and license texts to various consumer rules. Compliance people will care about an "is this OK to ship" consumer. Rules to make mobile app binaries will want to get to the license texts bundled into a resource.

maven has a plugin that can be a good reference in this context.
https://www.mojohaus.org/license-maven-plugin/index.html

Thanks. I have not read through their use cases yet. Do they have a
specific capability that my proposal is missing?

On Sun, Apr 19, 2020 at 8:26 AM SreeV notifications@github.com wrote:

maven has a plugin that can be a good reference in this context.
https://www.mojohaus.org/license-maven-plugin/index.html

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/7444#issuecomment-616121486,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAXHHHAGKY6RYI5PY5T63JLRNLUWPANCNFSM4GXZP4SQ
.

Is work on this proposal moving forward?

Yes. There is some tooling being delivered to
github.com/bazelbuild/rules_license
I expect that sometime in Q3 or Q4 we will move Bazel's internal checks
over to those tools, which will provide better examples.

On Mon, Jul 27, 2020 at 12:09 PM Joseph Lisee notifications@github.com
wrote:

Is work on this proposal moving forward?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/7444#issuecomment-664490547,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAXHHHAVF246DFKI7TDBN7LR5WRFHANCNFSM4GXZP4SQ
.

@aiuto are there any tracking bugs beyond this that I can follow for that progress?

Thanks so much!

This is the tracking bug. I have not replicated my TODO list of Google internal bugs to github because they are mostly about cutting over legacy systems which are not part of the new scheme.

Are there any updates about this that are going to be presented at BazelCon? If so, which talk should I tune into?

No updates for Bazelcon. It's been slower going than I would have liked.

On Thu, Nov 12, 2020 at 12:57 PM Andrew Z Allen notifications@github.com
wrote:

Are there any updates about this that are going to be presented at
BazelCon? If so, which talk should I tune into?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/7444#issuecomment-726239828,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAXHHHF4XAJOKRAF2VN7OIDSPQOX3ANCNFSM4GXZP4SQ
.

Was this page helpful?
0 / 5 - 0 ratings