It would be great to have a standardised way for specifying metadata in a package's Project.toml. The following is heavily inspired by npm.
keywords/tags -> array of strings (could be autopopulated from GitHubs tags by some third party utility)description/tagline -> string with a short (one line) description of the packagelicense -> string with license nameversion (I know this is controversial but would still be nice to have...)issue-url -> URL where the user can report issues with the package We need to support multiple cases here:
built-docs-url -> URL to git repo + branch (+ directory) that contains built documentationhosted-docs-url -> URL to hosted docsdocs-directory -> path to the dir containing the packages Documenter.jl source, falls back to docs/Those three options should span all sane use cases -- we have a default where the package author doesn't have to worry about docs deployment at all (because we'll build them), the current standard where you set up Documenter.jl on travis yourself and have it push to a gh-pages branch somewhere, and an option that allows you complete freedom with your self-hosted docs.
So for DifferentialEquations.jl the following metadata could be supplied:
[metadata]
keywords = ["differential-equations", "julia", "ode", "sde", "pde", "dae"]
description = "Julia suite for high-performance solvers of differential equations."
license = "MIT"
issue-url = "https://github.com/JuliaDiffEq/DifferentialEquations.jl/issues/new"
version = "v6.3.0"
[metadata.documentation]
built-docs-url = "https://github.com/JuliaDiffEq/DiffEqDocs.jl.git#gh-pages"
keywords, description, license and one of the docs keys are the most important for discoverability and usability, imho, so we should heavily encourage people to fill those in, e.g. when registering a package. Would also be sensible to sanity check the different fields.
The [metadata] section could of course contain arbitrary information (maybe a DOI or whatever).
Oh wow, this was next on my todolist and planned to do it this week. Also discussed this with Kristoffer yesterday :smile:.
To elaborate; my plan was to simply document that you can put whatever you want under [metadata] and Pkg will leave it alone. Other fields are Pkg free to ignore/overwrite/whatever.
From the things you list above I think that version should still be top-level (or possibly under a [package] section, see #179) since it is pretty important, and should be required. Other non-essential things can go under [metadata].
Some context about why this came up now. @pfitzseb and @SimonDanisch are working on DocumentationGenerator and we want to be able to search packages based on metadata in addition to data (package code and documentation content) and to facilitate that we want to have a standard for structured content and verify that it's followed, e.g. at package registration time. So Pkg can not care about the structure of this, but I think that we do want to have a well-defined schema for it and enforce it for registered packages at least. That way you'll be able to do things like search for only MIT/BSD-licensed packages with a certain numbers of GitHub stars, etc.
I had brought the issue a while ago in https://github.com/JuliaLang/julia/issues/27567. In my own research, I have had to develop a few tools to understand various OSS ecosystems and have experienced many edge cases. For example, CRAN metadata is quite on the high-quality spectrum, but it still has more than a couple bad entries / conflicting information, etc. I have used the Github API both v3 and v4 to gather statistics on Julia packages as well as developed tools for other measures of interest (e.g., DependenciesParser.jl, JuliaEcosystem). I think standardizing the metadata would be a huge step towards having a better ecosystem infrastructure.
Most helpful comment
I had brought the issue a while ago in https://github.com/JuliaLang/julia/issues/27567. In my own research, I have had to develop a few tools to understand various OSS ecosystems and have experienced many edge cases. For example, CRAN metadata is quite on the high-quality spectrum, but it still has more than a couple bad entries / conflicting information, etc. I have used the Github API both v3 and v4 to gather statistics on Julia packages as well as developed tools for other measures of interest (e.g., DependenciesParser.jl, JuliaEcosystem). I think standardizing the metadata would be a huge step towards having a better ecosystem infrastructure.