Berry: [Feature] Add support for package.json script variables

Created on 31 Jul 2020  路  14Comments  路  Source: yarnpkg/berry

  • [ ] I'd be willing to implement a fix

Describe the bug

When using "yarn pack" in Yarn v1, package.json variables (such as npm_package_version) are honoured. This seems to not be th case in yarn v2.

To Reproduce

I have a package.json which defines a "pack" script like so:

  "scripts": {
    "pack": "yarn pack --filename my-file-v$npm_package_version.tar.gz",
  }

The error when I run "yarn run pack" is Internal Error: Unbound variable "npm_package_version"

The _minimal_ information needed to reproduce your issue (ideally a package.json with a single dep). Note that bugs without minimal reproductions might be closed.

Environment if relevant (please complete the following information):

  • OS: Linux
  • Node version 10.22.0
  • Yarn version 2.1.1
enhancement help wanted

All 14 comments

That's not a bug, we just don't support it. Whether they are really worth it is still up in the air for me ... they always were pretty fringe, but on the other hand the snippet you share looks an acceptable use case.

I think if we were to support them, I would prefer them to be local variables accessible to the script commands, but not to the spawned processes (ie not exported through environment variables). That looks better encapsulation to me.

Hi @arcanis

Given that the feature was supported in Yarn v1 and is no longer supported, I can鈥檛 easily migrate my project from Yarn v1 to v2. Hopefully support will be added to v2 soon.

@arcanis I disagree with you in your statement about spawned processes. One of the primary purposes of the environments variables are so they're available in spawned processes. This allows processes to know if they were spawned by a script and what package spawned them including the version. This is highly useful information if you don't have access to the package.json which spawned applications may not especially if the code needing access to this information is in a dependent library. It's one way I'm able to include the application name and version in log messages without storing in alternative locations or having to read the filesystem (which either pauses the entire application using synchronous file system reads or requires asynchronous execution which may not be available where it's needed). Our organization has an internal library that configures logging for all our microservices. It uses the environment variables to determine which service is running and the version so it can be included in the logs.

Yarn 2.0 already supports npm_lifecycle_event as an environment variable available in spawned processes which already tells you which script spawned the process. Why not these others?

Additionally, if they were supported in Yarn 1.0, I'm confused as to why Yarn 2.0 would not support them.

If it's a conscious decision and you absolutely will not be supporting this, it should be listed in the Yarn 2.0 migration guide as something that will not be supported that was available in 1.0 and honestly it should be listed somewhere in general as this is an incompatibility between npm and yarn. This exact issue caused our production logging systems to stop logging to our central error reporting system and it was not caught through automated tests because it did not affect the actual functionality of the application. When no "npm_package_name" was found, the logging subsystem was unable to determine the component it was logging and assumed it was running locally.

My other issue was closed as a duplicate which included more detailed information including reproduction steps and links to supporting documentation so I'll just add it here.

See:
https://docs.npmjs.com/misc/scripts#packagejson-vars
https://github.com/yarnpkg/yarn/blob/master/src/util/execute-lifecycle-script.js

To Reproduce

Create a new directory with package.json with the following contents and run yarn set version berry then yarn:

{
  "name": "npm_package_test",
  "version": "0.0.0",
  "main": "index.js",
  "license": "MIT",
  "config": {
    "port": 8080
  },
  "scripts": {
    "test:name": "node -e 'console.log(process.env.npm_package_name)'",
    "test:version": "node -e 'console.log(process.env.npm_package_version)'",
    "test:config": "node -e 'console.log(process.env.npm_package_config_port)'"
  }
}

Run: yarn test:name
Expected: prints npm_package_test
Actual: prints undefined

Run: yarn test:version
Expected: prints 0.0.0
Actual: prints undefined

Run: yarn test:config
Expected: prints 8080
Actual: prints undefined

Running the same test with npm run --silent test:name, npm run --silent test:version, and npm run --silent test:config all match the expected values. Yarn 1.0 also works correctly, like npm.

@yinzara You don't have to rely on package manager to do what you want:

{
  "name": "npm_package_test",
  "version": "0.0.0",
  "main": "index.js",
  "license": "MIT",
  "config": {
    "port": 8080
  },
  "scripts": {
    "test:name": "node -e 'console.log(require(__dirname + `/package.json`).name)'",
    "test:version": "node -e 'console.log(require(__dirname + `/package.json`).version)'",
    "test:config": "node -e 'console.log(require(__dirname + `/package.json`).config.port)'"
  }
}

I think you might not have read what I wrote. I said that it could be done through a synchronous file read or an asynchronous operation if the location of the package.json file is known. What you just did was a synchronous file read of the package.json based on the location of the script itself which would freeze the entire application while its reading the file system. This is duplicating work the package manager has to do anyway and still doesn't solve my problem. In fact, you did it through a require which is a dangerous way of reading a JSON file anyway. I might not even know the location of the package.json file while I'm in a dependent library.

I'm not sure why there is so much trepidation about this feature. If npm_lifecycle_event is already implemented in Yarn 2.0, why the hesitation on other environment variables that are properly working in Yarn 1.0?

I'm not sure why there is so much trepidation about this feature. If npm_lifecycle_event is already implemented in Yarn 2.0, why the hesitation on other environment variables that are properly working in Yarn 1.0?

Because those are two completely different things, as one is exposing an internal information in a way that wouldn't be available otherwise, whereas the other is merely duplicating information in a pretty ugly way (not only bad as in "it's not clean", but also as in "it can be downright broken on some pathological cases").

Features are integrated in Yarn when we think they are stable enough to be worth the maintenance cost (which you won't pay, I suspect), and in this case, at this point in time, I'm simply not convinced the bar is met. You have the right to disagree, but in this case I think it's fair to ask you to implement this (discouraged) behavior yourself through a plugin. It's fairly easy - Yarn exposes the setupScriptEnvironment hook just for this kind of purposes.

Hmmmm, :-) "discouraged" is pretty serious word. I have difficulty calling a feature that's a documented official feature of npm, included in yarn 1.0 and relatively benign (i.e. it does not affect any other functionality other than itself) as "discouraged".

But I'll agree to your point, it is duplicating information you could somehow figure out (not necessarily easily) while npm_lifecycle_event does not and yes I also agree it could be added in a plugin.

Since it's a feature of Yarn 1.0 and documented official feature of npm, would you accept it as an official plugin?

Maybe some "npm compatibility" plugin (I mean, we do have the node_modules plugin) where we can put other stuff like this that you disagree with npm's support of but is still official functionality of npm and included in yarn 1.0 so that those like me who are converting their projects have an official option?

I'm happy to create it myself and I'll respond to any issues that involve code in it. We could have a special "npmCompatibility" section in the .yarnrc.yml with a "enablePackageEnvironment" (or something to that effect, maybe even default it to true if you have the plugin installed).

We could officially restrict the purpose of the plugin to only include features that are official features of npm but not supported by Yarn 2.0 by default to limit the amount of maintenance it would need.

I think you might not have read what I wrote. I said that it could be done through a synchronous file read or an asynchronous operation if the location of the package.json file is known. What you just did was a synchronous file read of the package.json based on the location of the script itself which would freeze the entire application while its reading the file system.

You can make an asynchronous read of package.json as well. "Freeze" of the entire application is a stretch of mind. You need to "freeze" your application each time you "wait" for something. You should wait not only for package.json reads. To deliver your application npm_ env variables by your own logic Yarn needs to freeze itself until it reads package.json.

This is duplicating work the package manager has to do anyway and still doesn't solve my problem. In fact, you did it through a require which is a dangerous way of reading a JSON file anyway.

We are exchanging examples here. There is your use case where it can be dangerous maybe and there are other use cases where it will be not.

@larixer

The package.json is already being read and parsed by Yarn whenever you run a script. The whole reason I am suggesting this is because from the point of view of Yarn, this is only making a change to environment variables based on data yarn already has in memory (i.e. it should require basically 0 processing power or time).

You make an assumption that I can make an asynchronous read of package.json in the code where I need it. That is not necessarily the case and is not the case in my situation. The code that needs this is in a dependent library that handles logging. It has no real knowledge of the app its being used in. The library is depended on by a microservice. When the service starts the logging library is required, it prints to the log the package name and version of the app and the time it took to load.

The logger is only required and used synchronously as there was never a need for it to be asynchronous. There is no asynchronous "init" or "get logger" function it. To create such a thing I would have to completely redesign how the library works in a breaking manner and change 30+ microservices. I need the package name and version available in active memory that I can read synchronously when the app starts without calling an asynchronous function.

None of your arguments address my primary concern. This is an official documented feature of NPM and thus implemented in Yarn 1.0. Changing to use Yarn 2.0 now means that I lose official documented functionality that could break my application and I had no idea that I could have lost this functionality without digging through the current active issues.

You can debate the efficacy of the functionality, but in the end it is official and documented and Yarn 2.0 needs to make a conscious decision on how it will handle this situation and if not, it needs official documentation as to why and what the effects of that decision there are.

@larixer

The package.json is already being read and parsed by Yarn whenever you run a script.

The reading and parsing package.json is not a big deal, its just JSON.parse(fs.readFileSync(manifestPath)). Or you can make the same in asynchronous way. Duplicating the information from package.json, polluting environment variables, writing the code to test and maintain this behaviour for all the cases when Yarn run scripts is a totally different story.

The whole reason I am suggesting this is because from the point of view of Yarn, this is only making a change to environment variables based on data yarn already has in memory (i.e. it should require basically 0 processing power or time).

No, the statement that this change should require 0 processing power is not true. Yarn might read package.json from a stripped down copy, stored efficiently in the cache and due to this feature it will had to keep full package.json copy in the cache, which will make installs slower and disk size consumption higher. Adding this feature might add other constraints to Yarn 2 codebase, so we should be picky about features we want to add and maintain.

You make an assumption that I can make an asynchronous read of package.json in the code where I need it.

I do not make assumptions about your code. You have provided one example where npm_package_foo class of environment variables might be needed and I have provided counter-example, where there is an alternative way to do the same without this feature being added to Yarn. Thats all. I'm not in position to make assumptions about your code, especially when I cannot see it.

That is not necessarily the case and is not the case in my situation. The code that needs this is in a dependent library that handles logging. It has no real knowledge of the app its being used in. The library is depended on by a microservice. When the service starts the logging library is required, it prints to the log the package name and version of the app and the time it took to load.

The logger is only required and used synchronously as there was never a need for it to be asynchronous. There is no asynchronous "init" or "get logger" function it. To create such a thing I would have to completely redesign how the library works in a breaking manner and change 30+ microservices. I need the package name and version available in active memory that I can read synchronously when the app starts without calling an asynchronous function.

The directory location of the package.json having the script being run is recoded into process.env.INIT_CWD by npm, Yarn 1, Yarn 2 and pnpm. You can use any way Node allows you to load and parse the JSON data stored at this location at your preference.

None of your arguments address my primary concern. This is an official documented feature of NPM and thus implemented in Yarn 1.0. Changing to use Yarn 2.0 now means that I lose official documented functionality that could break my application and I had no idea that I could have lost this functionality without digging through the current active issues.

You can debate the efficacy of the functionality, but in the end it is official and documented and Yarn 2.0 needs to make a conscious decision on how it will handle this situation and if not, it needs official documentation as to why and what the effects of that decision there are.

The major releases are expected to break some existing workflows. Some of these breaking changes are documented and some are not, because of the nature of radical changes. You can help the project by documenting revealed breaking changes that you think impact many users and worth note in documentation. You can write a plugin for Yarn 2 to turn this behaviour back on, provided you are going to support this plugin (which is the case based on your messages). However, there should be great interest in the community to make this plugin bundled by default and the plugin itself should not impose unacceptable constraints on the Yarn 2 codebase.

No, the statement that this change should require 0 processing power is not true. Yarn might read package.json from a stripped down copy, stored efficiently in the cache and due to this feature it will had to keep full package.json copy in the cache, which will make installs slower and disk size consumption higher. Adding this feature might add other constraints to Yarn 2 codebase, so we should be picky about features we want to add and maintain.

I think you're going a bit too far to say that the package.json would have to be held in totality to achieve this.

The package name and version would need to be available (I'm pretty sure yarn is already reading and caching these for their internal use). So the primary use case (the one I would need it for the most) would definitely not require additional processing power or storage (at least not anything worth mentioning).

The only possible additional processing is for the "npm_package_config_" variables. We would probably have to read and store the "config" property when we do not currently, so yes that would be extra. There would also be additional processing indexing and slicing of some strings and allocating new objects to store them. So I can see an argument to not implement that or to require that it be implemented in a plugin with an option to enable/disable it. There is also an argument to do it in the core as the package.json has already been read from disk so this entire operation would still be in memory so would be much faster. Since the 'config' attribute is unlikely to be seen in almost any public package, if it does exist, more than likely the developer intended it to be read and used. However, even knowing that, I still agree with you on the premise that the additional processing for the vast majority of users isn't worth it.

The directory location of the package.json having the script being run is recoded into process.env.INIT_CWD by npm, Yarn 1, Yarn 2 and pnpm. You can use any way Node allows you to load and parse the JSON data stored at this location at your preference.

Yes, I can use any method NodeJS has available to me both of which either require asynchronous execution or synchronous disk IO and loading the entire package.json into memory which has already been done at least once.

The major releases are expected to break some existing workflows. Some of these breaking changes are documented and some are not, because of the nature of radical changes. You can help the project by documenting revealed breaking changes that you think impact many users and worth note in documentation. You can write a plugin for Yarn 2 to turn this behaviour back on, provided you are going to support this plugin (which is the case based on your messages). However, there should be great interest in the community to make this plugin bundled by default and the plugin itself should not impose unacceptable constraints on the Yarn 2 codebase.

There is a difference between a breaking change in a workflow and completely removing a feature that previously existed and not documenting ways to work around it; especially a feature that is officially supported and documented by the other tool yarn is based on. If this was just something I thought should be added to berry because it would improve it, I wouldn't be making this argument. As you've stated, there are other ways to get at this data (not that I really love them). But this is specifically choosing to not implement a feature that both npm and yarn 1.0 support because maintainers disagree with the intent, usefulness, or value/cost ratio of the feature.

I think all I'm asking if I created an official "npm-compatibility" plugin that implemented these features and submitted a PR for that, would it be considered?

I'm also asking, if I submitted a separate PR that made only the npm_package_name and npm_package_version available in the running environment variables, would it be considered? What if I made it available only in the local script environment like @arcanis suggested? Would that be considered?

I'm also asking, if I submitted a separate PR that made only the npm_package_name and npm_package_version available in the running environment variables, would it be considered?

I'd prefer to keep it outside for the time being. Not everything belongs to core, or this repo.

[...] and not documenting ways to work around it

We'd however merge a PR mentioning that in the migration guide.

We totally need npm_package_name. It is not too much to ask. Just one variable with a value you calculate anyway.

Obviously, we will totally respect your decision, but could you give a clear answer? Choices: NO; yes, but it belongs to a different project (a plugin?); yes, patch it away and we'll review.

In any case, I do not expect a ton of extra code to set a variable, or committing the project to some burdens that prevent future developments. I just don't see the risk involved. Care to share?

The npm team abandoned the old way where any value of package.json was mapped to an environment variable. Implementing 2 (two) environment variables will suddenly make both package managers compatible for scripts. Even 1 (one) will help a lot. And it is not like I can go to the npm team and ask them to ... what exactly? Please help people migrate from npm v7 and the previous versions — I am afraid that without a clear migration path fewer people will attempt to migrate existing projects. Maintaining two different versions for different package managers is untenable, I am saying it as a maintainer.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

joshmeads picture joshmeads  路  4Comments

Bessonov picture Bessonov  路  4Comments

bradleyayers picture bradleyayers  路  3Comments

larixer picture larixer  路  4Comments

thealjey picture thealjey  路  4Comments