There has been frustration among Tokio users regarding the number of crates pulled in when depending on Tokio. Here is an opportunity to discuss an alternative strategy. By doing this RFC, users who are happy with the current situation may express this.
Do not maintain tokio-* sub crates, instead all Tokio code will exist in a single tokio crate and components are enabled or disabled using feature flags.
For example, depending on only the timer functionality could be done with:
tokio = { version = "0.2.0", default-features = false, features = [ "timer" ] }
By default, tokio would have the same components enabled as it does today.
Maintaining a large number of crates comes with an increased maintainership burden. Maintaining correct dependencies between crates is complex. Users feel that large number of dependencies == bloat. Additional rational can be found here.
Tokio must maintain semver stability of its core APIs. This includes traits as well as some types, such as TcpStream. Tokio would like to be able to release breaking changes to less fundamental APIs without having to break the entire Tokio ecosystem.
Currently, Tokio achieves this goal by breaking up all the various components into individual crates. Doing this allows less stable components to release breaking changes without touching stable components. However, this strategy has drawbacks (see Motivation section).
In this proposal, all Tokio components would be moved into a single crate. Each component would have an associated feature flag, similar to how Tokio does it today.
Not much would change for application developers, they would still just depend on tokio and enable / disable feature flags as needed. Library developers would no longer depend on sub crates. Instead, they would depend on tokio and only pull in the features that they need.
Core types can maintain stability between breaking semver releases. For example, if the TcpStream type does not change between Tokio version 0.2 and Tokio version 0.3, then the following steps would be taken to release 0.3:
tokio 0.3tokio 0.2 to depend on tokio 0.3.TcpStream in 0.3 by re-exporting the implementation from 0.3.TcpStream type from 0.3.By doing this, TcpStream from 0.2 and 0.3 are the same type.
Continue to release new crates for each component.
In the Reddit thread on actix-web that probably prompted this, you said:
Tokio itself is split into many creates specifically to allow libs to pick and choose :) Any lib can depend on exactly the components they need and no more.
Now this sure sounds like a good thing to me, but is it known if any projects only pull in a small fraction of the tokio-* crates? It's always been my impression from looking at dependency graphs that they don't. I'm hoping someone has a good idea on how to collect data for this apart from manually auditing dependency graphs, which is the best I know how to do.
This seems generally reasonable, but I think we do need to figure out what to do with tokio-io. In 0.1, it's kind of "independent" of tokio in that it doesn't have any hard dependencies on the reactor, etc. Libraries like hyper and tokio-postgres use it as an interface for non-tokio runtimes.
In the new world, I think we will want some independent, small crate that defines AsyncRead/AsyncWrite/etc. Maybe that's just futures-io (there is another issue open for this)?
Overall I think merging the crates will lead to a better experience for consuming crates.
Breaking changes will require a lot more effort to re-export things than bumping individual crates, but in my experience, there are very few "leaf" tokio crates that can be easily bumped without having to bump other crates as well. So, net-net, I don't think we'll lose much flexibility in practice.
Also feature gating everything could make it easy to accidentally include too much (脿 la default features), but the remedy is a quick Cargo.toml edit rather than a code refactor, which is a better experience.
A single tokio crate would make it a lot easier to maintain tokio in distros that package individual crates, like debian and fedora.
I think using a single crate is also a good idea if rust is ever going to do dynamic linking per crate.
I think that merging all of the crates into a single crate may be mostly just sweeping the perceived bloat under the rug.
A quick perusal of cargo tree --no-dev-dependencies in reqwest turns up dependencies that I would expect not to be needed, such as a full tokio runtime, including a threadpool. reqwest provides a synchronous API for simplicity (it also provides an async API, but I feel like that should be an optional feature since many users are only using it for the simple synchronous API).
I think that hyper and/or reqwest include a threadpool to be able to do asynchronous name resolution when using getaddrinfo, but if you're just providing a simple blocking interface anyhow, it seems reasonable to also block on getaddinfo. I feel like a lot could be done to reduce bloat if there were some optional simpler single-threaded runtime for use cases like this that didn't have to pull in crossbeam, parking_lot, num_cpus, and a bunch of other things that make sense for a full-featured asynchronous runtime with a threadpool for CPU-bound or otherwise blocking operations, but don't make sense when just trying to use a synchronous API for a network protocol.
Another source of extra crates seems to be rand, which is depended on in a few places. For some inexplicable reason, it has non-optional dependencies on a number of different random number generation algorithms, despite most users not caring and just wanting an easy and dependable way to get random numbers.
reqwest also has a few mandatory dependencies that should probably be made optional; they could be turned on by default for convenience, but I think that if you want to avoid bloat you should be able to turn them off. For instance, it uses serde for convenience methods for parsing json from requests and building queries from URL pairs, but serde is a somewhat big dependency and these couple of convenience features could be made optional.
There are a number of the dependencies in reqwest's dependency tree that I can nod along and say "yeah, I can see why a convenient library for sending HTTP requests and parsing replies would need that"; URL parsing which then depends on IDNA which depends on some various unicode crates, cookie store, some encoding libraries, http, hyper, h2, and then a collection of basic ecosystem libraries like log, bytes, and so on (though there are also a couple of other basic ecosystem libraries that there isn't a settled answer for, like error handling, leading to both error-chain and failure showing up in the dependency tree. But there are also a lot that seem extra, and even when you take into account that if you have a general-purpose async HTTP implementation you will need some kind of runtime to run it while providing a synchronous API, this particular runtime seems to be way bigger and pull in more dependencies than is really necessary for the purpose.
I think that there are also some real issues of reqwest indirectly depending on many more things than it really should, and merging back into one crate may make that a bit less visible, while as it is is can make it a little easier to find and address those issues one at a time, if anyone is motivated to do so.
There are some places where merging might make sense, such as the crates in the rt-all feature of Tokio which seem to be very frequently used together and I'm unclear if they can really be meaningfully used independently, but I think that merging everything into one crate would just obscure some of the sources of pulling in too many dependencies.
Also, while the original comment in this thread indicates that the tokio crate is just intended to be used by applications, and sub-crates by libraries, it looks like both hyper and reqwest are libraries depending on tokio, so it looks like either that intent hasn't been communicated or there's some issue with using the sub-crates independently; they also happen to depend on a number of sub-crates, so I don't know why without further investigation they also have to depend on tokio itself.
Finally, I think one of the bigger wins may be some better way of counting or visualizing dependencies, which takes into account sets of dependencies that all come from the same source. If some piece of code is coming from a feature or a sub-crate developed in the same repository, the only main difference is that it can be more easily used and semantically versioned independently if it's a separate crate; but it will show up quite differently in a dependency graph, leading to the feeling of bloat; and some of what I think feels like the perceived bloat is also the perceived number of sources that need to be audited or people and organizations that need to be trusted.
If the notion of multiple sub-crates actually all coming from the same parent project/workspace/repository were more prominent in some of these tools like cargo tree or other tools used to count dependencies, it might not over-count in places like this where the separate crates are really just a more convenient way of organizing code within a single logical project, rather than separate projects which need to be each evaluated independently.
Since this was long, in summary:
With feature flags there might be a risk that you don't include the ones you need, because other dependencies of yours happen to include them, leading to possible breakage down the road when your dependencies change the flags they require. Maybe that's not a big problem since it's easy enough to fix after it happens? But it would be more of a problem for newer coders who might not understand why they're getting errors.
Another concern is forking and [patch]/[replace]-ing Tokio components.
With single crate, entire Tokio needs to be replaced instead of e.g. just tokio-tcp.
I understand the maintainability concerns but I don't get the issue with the amount of dependencies. Currently someone can depend on tokio which re-export common crates, so one does not have to manually deal with many dependencies. In general I appreciate the modularity of the ecosystem and I feel using features to achieve the same end is less elegant and practical/discoverable for users.
Now, the development burden is a good reason to go for a single crate, but as a Tokio user I feel the current solution is practical. It lets libraries depend only on relevant crates and binaries can easily pull tokio.
Regarding @kpcyrd's comment on packaging tokio in Linux distros, I don't really see how this helps until Rust has a fixed ABI and it seems current decisions shouldn't be based on such long term prospects. I also don't see Linux distros package Rust librairies like they package C/C++ lib headers or Python modules, because Cargo handles that much better.
I'm not sure if Cargo features are robust enough for this.
For example, in practice default-features = false is almost impossible to use, because any dependency anywhere in the dependency tree that just includes tokio = "1" will silently bring all the default features back.
Rust gives poor error messages when user forgets to add a feature flag. It just prints that the thing doesn't exist, but the docs say it exists! Super confusing.
number of crates pulled in when depending on Tokio
In what way? I can see two sides:
User has to add multiple dependencies, so it's a chore to add multiple entries to Cargo.toml, open multiple docs pages, etc.
Compilation lists lots of stuff, so it feels "bloated"
The first case could be fixed by still having separate crates, but also offering a top-level crate that groups and re-exports all of them. Users would add tokio-kitchen-sink to their projects and use all components from there.
The second is not a real problem IMHO, but merely a perception of a problem. The amount of code compiled will be similar either way (or even worse, given default-features=false unusability and limited parallelism in rustc).
I've got a feeling that there's a group of new Rust users who come from languages with either huge stdlib (so nobody needs to use dependencies), or languages where dependencies are a pain (so everybody avoids using dependencies), so they're shocked how nonchalantly Rust/Cargo uses deps. But for Rust that's fine, so the real problem is communicating to users that they shouldn't be worried when the compilation step prints many lines of "Compiling X".
Compilation lists lots of stuff, so it feels "bloated"
I think users (including me) are frustrated by how fast compile time grows as we add dependencies, and the size of the dependency graph is an easy target for complaint. The presence of all the tokio crates and multiple versions of all the rand crates in an ostensibly single-threaded program quickly add credibility to blaming the compilation time on the size of a dependency tree. It would be interesting to assess how end-user compilation time is altered by the suggested changes.
I think this is an XY problem. The perceived bloat is solved by binary packages in cargo/crates.io and caching it close to the CI instance.
I'm not talking about CI.
I am personally in the happy with the current situation boat. The arguments against many dependencies usually boil down to three metrics:
If maintaining all the smaller tokio-* crates has proved itself to be a challenge and merging them brings a benefit to maintainers and maintenance, I'd be highly in favour but the other reasons I personally disagree with
Note: if managing separate versions of the crates and which depends on which is indeed the main motive for this change there are automation tools that levitate or minimize that burden while keeping all the benefits of current approach
@saethlin, my point was more general to the issue being addressed by the RFC: users perceive bloat in tokio and the RFC here is attempting to mitigate it by making an uber crate that wraps everything up. Aside from creating a new project, CI for a project depending on tokio, and actually working on tokio itself, when do you need to build tokio?
Regarding @kpcyrd's comment on packaging tokio in Linux distros, I don't really see how this helps until Rust has a fixed ABI and it seems current decisions shouldn't be based on such long term prospects. I also don't see Linux distros package Rust librairies like they package C/C++ lib headers or Python modules, because Cargo handles that much better.
Those are two separate issues. If you use micro libraries dynamic linking would imply loading >100 .so's into the process which is a non-zero-cost abstraction. With "C sized" crates the unused code would be LTO'd anyway. This is unrelated to distros.
The very real problem with distros is the review process for new packages. If rand decides it's going to need 5 more rand-* crates we need to get them all reviewed and approved. To upload the new crates we need to update rand-core first which breaks the existing dependency tree. Updating rand is generally a non-trivial effort that takes multiple weeks (up to months).
Now this sure sounds like a good thing to me, but is it known if any projects only pull in a small fraction of the tokio-* crates? It's always been my impression from looking at dependency graphs that they don't. I'm hoping someone has a good idea on how to collect data for this apart from manually auditing dependency graphs, which is the best I know how to do.
Starting at https://crates.io/crates/tokio/reverse_dependencies i've listed the numbers of reverse dependencies on crates.io
Hopes this help
| tokio | #dependent |
| ------------ | ------------- |
| tokio | 608 |
| core | 381 |
| buf | 6 |
| codec | 114 |
| current-thread | 22 |
| executor | 51 |
| fs | 23 |
| io | 318 |
| reactor | 43 |
| signal | 22 |
| sync | 16 |
| tcp | 59 |
| threadpool | 36 |
| timer | 112 |
| tls | 61 |
| udp | 12 |
| uds | 42 |
Some additional random thoughts.
One thing I forgot to consider earlier: can cargo handle different feature flags across dependencies and dev-dependencies? (at least it wasn't able to in the past...)
A common pattern I've seen is libraries depending on the minimal tokio functionality they need, while pulling in all bells and whistles during testing. If cargo cannot support enabling extra dependency features during testing, crates may end up depending on the entirety of tokio
- Too much code, takes too long to compile: This is strictly worsened by monolithic dependencies due to advantages of parallel compilations being evaporated.
@Mathspy This is not necessarily true- rustc itself does parallelize compilation within a single crate, while cargo doesn't (yet) compile dependency chains in parallel. So depending on the crate graph things could go either way- it would have to be benchmarked, and it will change over time as the tools improve.
But this is still a relevant point- one of the reasons people complain about the number of dependencies is that it's a proxy for "compilation is slow," and simply merging them will not really change things there. The only way to fix that is to compile less, and simpler, code.
And we will get a bloat of dependencies if you use tokio for codecs only in the library and tcp/udp in tests, because cargo will combine two features together:
[dependencies.tokio]
version = "0.3"
default-features = false
features = ["codec"]
[dev-dependencies.tokio]
version = "0.3"
default-features = false
features = ["codec", "tcp", "rt-full"]
It will compile tcp and runtime even for a non-test builds for the end users of a library.
It will compile tcp and runtime even for a non-test builds for the end users of a library.
Not if they are pulling this library from crates.io, dev-dependencies features are only merged in when a crate is inside the current workspace or a path dependency.
dev-dependencies features are only merged in when a crate is inside the current workspace or a path dependency.
Good to know. Thanks
cargo-crev reviews and cargo-audit security advisories are per crate, and don't take features into account. If tokio keeps crates separate, and there's an issue with one of the less often used components, these will affect fewer users.
@rpjohnst Oh! I see, thank you for the clarification!
cargo-crev reviews and cargo-audit security advisories are per crate, and don't take features into account. If tokio keeps crates separate, and there's an issue with one of the less often used components, these will affect fewer users.
That's a tradeoff with runtime overhead for a tooling problem though. It doesn't improve security, only binaries that actively run the vulnerable code are affected in both cases.
To me, the most important factors are:
It seems that this change makes the first one harder, and allows the latter to grow with lesser negligence than the current arrangement.
Why should monolith v0.2 depend on monolith v0.3 instead of them both depending on unchanged-type v0.1?
The superficial issue of "bloat" in a number of tiny packages from the same reputable organization is replaced by the more real issue of "churn" of unchanged types and code being republished over and over again.
There are valid arguments for and downsides to both approaches.
Since many of the comments have been pro multi-crate, I'll add some for the single crate approach (which is my preferred solution).
I work with multiple companies that require each dependency to be reviewed. Each version must be signed off by an employee for production use. Splitting libraries like tokio into multiple crates makes this a lot more work. The impact is smaller on the initial review because you need to look at all code anyway, but jumping around between different crates still makes this harder than with a single crate. You need more understanding of the architecture and boundaries between the crates, and just need to keep more things in your head.
Constant version bumps across many different crates also increase the workload here and just generally drive up the complexity of the process, leading to annoyed maintainers.
I also know companies that have policies like All dependency releases must be screened for bug fixes and fixed vulnerabilities within 1 working day.. Doing this across many crates with constant releases is definitely more taxing and time consuming. (Reading 1 changelog vs 7 ...)
Even without mandatory review requirements as above, more crates invariable lead to higher velocity of change. More releases, more CHANGELOGs to read, more version bumps, more chances for a the inevitable bugs to cause a problem and more friction in the entire ecosystem.
Breaking changes are especially bad in this context.
While this also makes it easier to get changes out instead of consolidating to a single-crate release, I'm wondering if this kind of velocity is actually desirable for such an important low-level building block - assuming a certain stability and maturity of the the codebase.
It also increases the chance of multiple versions of a crate to sneak in to your build, which is always suboptimal (build time, inconsistencies, ...) and leads to the often annoying process of finding out why and fixing it - usually with a PR for another dependency.
@qm3ster mentioned that a single crate would make contributing harder. I'm curious why that is.
Personally, I've always found it much easier to understand and contribute to a single crate, rather than something that is split across many crates.
It leads to jumping around multiple crates and following re-exports to gain a full understanding.
It also leads to potentially having to change multiple crates and having to be more aware of the public API boundaries of each crate + extra effort to avoid a breaking change.
A single crate is much better for understanding the code and first time contributing IMO.
There have been multiple claims for better build performance with multiple crates, due to parallel compilation. This claim really needs some substantiation with measurements.
I'd imagine that the benefit is limited to the first check/incremental build run. Multiple crates might very well create more work for the final build steps and LLVM vs a single crate. The first build is important for CI and things like cargo install, but I'd argue that the final build is much more important for daily dev workflow.
Subjectively, I remember build times being better before the split up in tokio and futures. This of course might just be because the stack has grown in complexity and gained more features.
But the point is: including performance in a decision would need some validating benchmarks.
@AZon8 posted some numbers for how tokio is used on crates.io.
Most applications using tokio are private and not on crates.io, so crates-io data actually make dependencies on sub-crates much more likely than total real world use due to the emphasis on libraries and building blocks vs full applications.
I think it makes sense to optimize for the most common use case, assuming that more selective usage is still possible.
This has been implemented.
Most helpful comment
I think that merging all of the crates into a single crate may be mostly just sweeping the perceived bloat under the rug.
A quick perusal of
cargo tree --no-dev-dependenciesinreqwestturns up dependencies that I would expect not to be needed, such as a fulltokioruntime, including athreadpool.reqwestprovides a synchronous API for simplicity (it also provides an async API, but I feel like that should be an optional feature since many users are only using it for the simple synchronous API).I think that
hyperand/orreqwestinclude athreadpoolto be able to do asynchronous name resolution when usinggetaddrinfo, but if you're just providing a simple blocking interface anyhow, it seems reasonable to also block ongetaddinfo. I feel like a lot could be done to reduce bloat if there were some optional simpler single-threaded runtime for use cases like this that didn't have to pull incrossbeam,parking_lot,num_cpus, and a bunch of other things that make sense for a full-featured asynchronous runtime with a threadpool for CPU-bound or otherwise blocking operations, but don't make sense when just trying to use a synchronous API for a network protocol.Another source of extra crates seems to be
rand, which is depended on in a few places. For some inexplicable reason, it has non-optional dependencies on a number of different random number generation algorithms, despite most users not caring and just wanting an easy and dependable way to get random numbers.reqwestalso has a few mandatory dependencies that should probably be made optional; they could be turned on by default for convenience, but I think that if you want to avoid bloat you should be able to turn them off. For instance, it usesserdefor convenience methods for parsingjsonfrom requests and building queries from URL pairs, butserdeis a somewhat big dependency and these couple of convenience features could be made optional.There are a number of the dependencies in
reqwest's dependency tree that I can nod along and say "yeah, I can see why a convenient library for sending HTTP requests and parsing replies would need that"; URL parsing which then depends on IDNA which depends on some various unicode crates, cookie store, some encoding libraries, http, hyper, h2, and then a collection of basic ecosystem libraries likelog,bytes, and so on (though there are also a couple of other basic ecosystem libraries that there isn't a settled answer for, like error handling, leading to botherror-chainandfailureshowing up in the dependency tree. But there are also a lot that seem extra, and even when you take into account that if you have a general-purpose async HTTP implementation you will need some kind of runtime to run it while providing a synchronous API, this particular runtime seems to be way bigger and pull in more dependencies than is really necessary for the purpose.I think that there are also some real issues of
reqwestindirectly depending on many more things than it really should, and merging back into one crate may make that a bit less visible, while as it is is can make it a little easier to find and address those issues one at a time, if anyone is motivated to do so.There are some places where merging might make sense, such as the crates in the
rt-allfeature of Tokio which seem to be very frequently used together and I'm unclear if they can really be meaningfully used independently, but I think that merging everything into one crate would just obscure some of the sources of pulling in too many dependencies.Also, while the original comment in this thread indicates that the
tokiocrate is just intended to be used by applications, and sub-crates by libraries, it looks like bothhyperandreqwestare libraries depending ontokio, so it looks like either that intent hasn't been communicated or there's some issue with using the sub-crates independently; they also happen to depend on a number of sub-crates, so I don't know why without further investigation they also have to depend ontokioitself.Finally, I think one of the bigger wins may be some better way of counting or visualizing dependencies, which takes into account sets of dependencies that all come from the same source. If some piece of code is coming from a feature or a sub-crate developed in the same repository, the only main difference is that it can be more easily used and semantically versioned independently if it's a separate crate; but it will show up quite differently in a dependency graph, leading to the feeling of bloat; and some of what I think feels like the perceived bloat is also the perceived number of sources that need to be audited or people and organizations that need to be trusted.
If the notion of multiple sub-crates actually all coming from the same parent project/workspace/repository were more prominent in some of these tools like
cargo treeor other tools used to count dependencies, it might not over-count in places like this where the separate crates are really just a more convenient way of organizing code within a single logical project, rather than separate projects which need to be each evaluated independently.Since this was long, in summary: