Currently the user experience for publishing crates that have invalid or non-conformant keywords is less than ideal.
error: api errors: invalid upload request: invalid value: string "static compress", expected a valid keyword specifier at line 1 column 8372 error: api errors: invalid upload request: invalid length 9, expected at most 5 keywords per crate at line 1 column 8440;cargo publish should probably do some basic linting on the keywords parameter of the toml, and throw more user-friendly errors than what is currently done. The errors returned by the server are useful for someone hacking away at cargo, not for a developer using cargo (since they refer to the payload generated by the cargo publish command and sent to the server; i.e. those line and column numbers have no meaning to the end user).
I think even cargo check and cargo package should do so, because otherwise the problem surfaces after all the config is committed into the repository and publishing is in progress.
Hi @carols10cents I would be happy to help fix this issue. Anyone who can mentor on what needs to be done?
@behnam @mattgathu @carols10cents :wave: Hello from CodeTriage!
This is still a problem in version 1.33.0 (f099fe94b 2019-02-12).
We could fix this issue by duplicating the validation logic from crates.io:src/models/keyword.rs on cargo check/package/publish. Does that seem like a valid solution?
Thanks, @mattgathu and @damon-myers for following up.
From what I understood and remember, the problem is that the range of some values, such as package keywords, is defined by the package registry, and may in fact differ from one to another.
That, and introduction/existence of private registries and their possible reliance on other keyword value rules, makes me wondering if there's anything we can do here that would be scaleable.
On option that comes to my mind is to special-case crates.io as a core package registry, and maintain a copy of some of its policies, like format of keyword values.
Another option could be expect the registry to either have an API for this (which cargo package and such can use per call), or have an API to give us a rule, like a list of values or a RegEx pattern, to be cached and used for verifying the related package attributes.
I haven't been active with Cargo for a while, so these suggestions may be a bit off, or some of it may already exist. I leave it for the leads to clarify and make actionable suggestions.
I think we intend to share some of the validation somehow.
Out of curiosity, I scanned crates.io for any crates with keywords that wouldn't get validated today. I found some, presumably crates.io validation has changed over time.
keyword errors
"krust/krust-0.0.1/Cargo.toml" too many keywords: 6 "apply_pub/apply_pub-0.0.2/Cargo.toml" too many keywords: 6 "google-geo/google-geo-0.1.0/Cargo.toml" invalid keyword: "location infomation" "sntp_client/sntp_client-1.2.0/Cargo.toml" invalid keyword: "command line" "shm/shm-0.1.0/Cargo.toml" invalid keyword: "shared memory" "owned-fd/owned-fd-0.1.0/Cargo.toml" invalid keyword: "file descriptor" "scell/scell-1.0.0/Cargo.toml" invalid keyword: "smart cell" "scell/scell-1.0.0/Cargo.toml" too many keywords: 11 "nl80211rs/nl80211rs-0.1.0/Cargo.toml" invalid keyword: "nl80211.h" "dumbmath/dumbmath-0.2.2/Cargo.toml" too many keywords: 7 "routeros_rust/routeros_rust-0.0.21/Cargo.toml" invalid keyword: "Router Os" "routeros_rust/routeros_rust-0.0.21/Cargo.toml" invalid keyword: "Router Os API" "strtod/strtod-0.0.1/Cargo.toml" invalid keyword: "floating point" "pairing-heap/pairing-heap-0.1.0/Cargo.toml" invalid keyword: "priority queue" "comcart/comcart-0.1.0/Cargo.toml" invalid keyword: "common cartridge" "packagemerge/packagemerge-0.1.0/Cargo.toml" too many keywords: 8 "alpaca/alpaca-0.1.0/Cargo.toml" invalid keyword: "Variant Calling" "ithos/ithos-0.0.0/Cargo.toml" invalid keyword: "access control" "cryptosphere/cryptosphere-0.0.0/Cargo.toml" too many keywords: 6 "snzip/snzip-0.1.0/Cargo.toml" too many keywords: 9 "meta_diff/meta_diff-0.0.1/Cargo.toml" invalid keyword: "machine learning" "rustyham/rustyham-0.0.1/Cargo.toml" invalid keyword: "hamming code" "carto/carto-0.1.0/Cargo.toml" invalid keyword: "text editor" "jwk/jwk-0.1.0/Cargo.toml" invalid keyword: "RFC 7517" "fixed_circular_buffer/fixed_circular_buffer-0.2.2/Cargo.toml" too many keywords: 7 "nickel_macros/nickel_macros-0.1.0/Cargo.toml" invalid keyword: "web server" "message-format/message-format-0.0.1/Cargo.toml" too many keywords: 8 "nailgun/nailgun-0.1.0/Cargo.toml" too many keywords: 8 "tagua-parser/tagua-parser-0.1.0/Cargo.toml" invalid keyword: "virtual machine" "tagua-parser/tagua-parser-0.1.0/Cargo.toml" too many keywords: 6 "kalman/kalman-0.0.0/Cargo.toml" invalid keyword: "K谩lm谩n filter" "sdp/sdp-0.1.0/Cargo.toml" invalid keyword: "Session Description Protocol" "sdp/sdp-0.1.0/Cargo.toml" keyword too long (28): "Session Description Protocol" "way-cooler-ipc/way-cooler-ipc-0.0.0/Cargo.toml" too many keywords: 6 "ghlabel/ghlabel-0.1.0/Cargo.toml" invalid keyword: "github issues" "ghlabel/ghlabel-0.1.0/Cargo.toml" invalid keyword: "pull requests" "netcdf/netcdf-0.1.0/Cargo.toml" too many keywords: 7 "humanity/humanity-0.1.0/Cargo.toml" invalid keyword: "humans.txt" "switchboard/switchboard-0.1.0/Cargo.toml" invalid keyword: "state machine" "smbclient-sys/smbclient-sys-0.1.0/Cargo.toml" too many keywords: 6 "has/has-0.1.0/Cargo.toml" invalid keyword: "has a" "boehm_gc/boehm_gc-0.0.1/Cargo.toml" invalid keyword: "garbage collector" "orc/orc-0.0.1/Cargo.toml" invalid keyword: "garbage collector" "orc/orc-0.0.1/Cargo.toml" invalid keyword: "reference counting" "ikura/ikura-0.0.1/Cargo.toml" invalid keyword: "" "tg-labstatus/tg-labstatus-0.1.0/Cargo.toml" invalid keyword: "openlab augsburg" "dual_quaternion/dual_quaternion-0.1.0/Cargo.toml" invalid keyword: "dual quaternion" "i18n/i18n-0.0.1/Cargo.toml" too many keywords: 11 "bytereader/bytereader-0.1.0/Cargo.toml" too many categories: 7 "network-constants/network-constants-0.0.1/Cargo.toml" invalid keyword: "tcp/ip" "network-constants/network-constants-0.0.1/Cargo.toml" too many keywords: 8 "rust-gm-paillier/rust-gm-paillier-0.1.0/Cargo.toml" too many keywords: 6 "findup/findup-0.1.0/Cargo.toml" too many keywords: 6 "swagger_to_md/swagger_to_md-1.0.0/Cargo.toml" too many keywords: 6 "currency/currency-0.4.0/Cargo.toml" too many keywords: 12 "hashindexed/hashindexed-0.1.1/Cargo.toml" too many keywords: 7 "checked_int_cast/checked_int_cast-1.0.0/Cargo.toml" too many keywords: 6 "uwp/uwp-0.0.0/Cargo.toml" invalid keyword: "Universal Windows Platform" "uwp/uwp-0.0.0/Cargo.toml" keyword too long (26): "Universal Windows Platform" "libmultilog/libmultilog-0.1.0/Cargo.toml" too many keywords: 7 "silverknife-pangocairo-sys/silverknife-pangocairo-sys-0.1.0/Cargo.toml" too many keywords: 8 "ipecho/ipecho-0.0.1/Cargo.toml" invalid keyword: "public ip"
I personally wouldn't be opposed to always imposing these restrictions regardless of the registry. Also, as noted in #4377, these would probably be warnings.
Keyword format doesn't seem to be documented anywhere? I had to read the source code linked above to figure out what makes a keyword valid
Most helpful comment
Thanks, @mattgathu and @damon-myers for following up.
From what I understood and remember, the problem is that the range of some values, such as package keywords, is defined by the package registry, and may in fact differ from one to another.
That, and introduction/existence of private registries and their possible reliance on other keyword value rules, makes me wondering if there's anything we can do here that would be scaleable.
On option that comes to my mind is to special-case crates.io as a core package registry, and maintain a copy of some of its policies, like format of keyword values.
Another option could be expect the registry to either have an API for this (which
cargo packageand such can use per call), or have an API to give us a rule, like a list of values or a RegEx pattern, to be cached and used for verifying the related package attributes.I haven't been active with Cargo for a while, so these suggestions may be a bit off, or some of it may already exist. I leave it for the leads to clarify and make actionable suggestions.