Fake motivation:
Associative array data structures like some skiplists, B-trees, and radix/compact prefix trees need less to no randomness to get similar speed with less memory overhead. Ignoring compression, lower memory use means:
And there's the problems/bikesheds of hashing/collision strategies and entropy source quality that I'd prefer not to leave to a standard library.
Real motivation:
I recently learned about a better crit-bit tree called a qp trie. Some people have measured speedup over hash maps (more for writes) and much better memory usage per key. There's a few Rust ports.
Caveats:
I think we should definitely add some more data structures in general, as right now we only have the following:
Some extra things that may be useful to include.
Given that the stdlib will probably be heavily revised before 1.0 we can always experiment and figure out what works now with no huge worry. It would be important to clearly document the tradeoffs for each data structure to the user if there is potential overlap.
Aside: I have read that argument on SipHash. It seemed fairly flimsy though since if you have sufficient access to read the secret key from process memory, I think you have bigger issues/considerations.
Just want to note that boost has shown that circular buffers (or fixed ring buffers) are amazingly powerful for queues and stacks providing almost vector/arrayList like speed rather than the often 10x slower deque speed that they commonly are implemented with.
Practical restriction would be to allow only fully documented data structures.
Simple types like 2D and 3D point should be added, so people are not tempted to create their own.
Simple types like 2D and 3D point should be added, so people are not tempted to create their own.
I disagree. Aren't they mostly for graphics, windows, games, and scientific computing? And people should be tempted to make their own in this case - a Point is like the simplest struct ever and has no clear associated functions to make a real module out of.
@tiehuis
that argument on SipHash. It seemed fairly flimsy
I don't personally care about SipHash (except its speed when good mixing isn't needed), but I do care that a server using a probabilistic data structure like a hashmap has to deal with (adversarial) worst case bucket balancing, blocking/crappy OS RNGs, etc.
Constant time operations are a whole 'nother matter, though.
Speaking of crypto, what about HTTP/2, encoding/decoding, regex, and command line parsing? Certain data structures may be necessary just to implement them, but on the other hand nontrivial algorithms exposed through the standard library increases the maintenance burden of Zig's std due to something as simple as discussing API etc. tradeoffs in issue trackers.
@0joshuaolson1
Simple types available in standard library may result in compatible libraries. In C every graphics library (like Cairo) got their point and rectangle. Similar mess happened with boolean type in C, and (until stdint.h) with sized integers.
I hope Zig's comptime can overcome C/C++/Rust's problem where people need to reimplement parts of the standard library with macros/templates.
Little list of what I think should be in Zig in terms of data structures;
Just felt I should add my two cents
@BraedonWooding
BigInt (I guess)
Here is detailed tutorial (consisting of 10 parts) how to implement multiple precision arithmetic, usable even for cryptographic applications.
Here is a quick vector thing I drafted up when I had a few hrs;
https://github.com/BraedonWooding/zig/blob/Vec/std/math/vector.zig
To me 300 lines is still like a little small to completely warrant a standard library entry, however I think that the whole keep it consistent business is important :).
"What data structures should belong in std?"
IMHO none. Why? Because the idea behind Zig is to be a more ergonomic/safe C or actually a more ergonomic/safe/portable assembly. The data structures should belong to a library.
I love data structures, kd-trees R-trees and friends among others. Why not create a project outside Zig that has everything dreamed of in various modules, geometric data structures, probabilistic data structures, string processing data structures.
@fithisux: projects outside would be suspected as being abandoned or obsolete. Being part of the distribution gives some assurance that the bloody thing may actually work.
Red-black trees were just merged into master.
Yep. To further clarify:
Nearly anything that is generally useful, and has minimal or no OS dependencies is welcome into the stdlib pre-1.0.0. This helps act as a test bed for the language, and helps give us real APIs to get working with each other as we work on the language and standard library.
Then, when we are nearing 1.0.0 and the language is stable, we'll have a comprehensive std lib audit. Everything gets fully documented. Many APIs are removed from std. Volunteers are welcome to take ownership of the removed code in separate repositories. APIs that remain are made to be consistent with each other in naming, style, and patterns. The overarching goal here is going to be for std to be as minimal as possible, except where such minimalism would harm the community or introduce dependency issues. Everyone should be happy with the zig package manager before we do this audit.
As a language user I prefer a usable and clean standard library to a featureful and spaghettified one. I find it unlikely that things introduced will later be removed once people depend on them as the language grows.
Consider the enormous size of the Java standard library for instance.
I find it unlikely that things introduced will later be removed once people depend on them as the language grows.
If it's difficult to update imports when an std package is moved out, that's a language/tooling problem.
EDIT: And if imports worked a little more like IPFS (for example), nobody would even notice (besides maybe network fetching with version upgrades).
First of all, you do not have control over imported packages. I cannot force someone to update their library to stop using _someone else's_ library which uses deprecated stdlib features. Difficulty with deprecation is a very real problem.
As to API stability, let's not forget the impact of Rust's API stability issues on its community.
I'm not clear on your IPFS comparison. Are you suggesting that deprecated stdlib imports should alias to remotely resolved packages at build time?
Libraries shipped with the compiler are expected to be usable. Random code somewhere on the internet is seen as a crap to be avoided.
What I'd like as the ideal solution:
With this the programmer won't be overloaded and would have a chance to customize stdlib code. Proposed band-aid features like struct extensions (#1170) won't be needed.
you do not have control over imported packages.
Neither can you control which Zig versions or platforms libraries are written for, but they're not always crap as PavelVozenilek put it. The problems of std API changes and library writers changing things are similar.
I'm not clear on your IPFS comparison. Are you suggesting that deprecated stdlib imports should alias to remotely resolved packages at build time?
I'd imagine package management can be done separately from building.
Most helpful comment
Yep. To further clarify:
Nearly anything that is generally useful, and has minimal or no OS dependencies is welcome into the stdlib pre-1.0.0. This helps act as a test bed for the language, and helps give us real APIs to get working with each other as we work on the language and standard library.
Then, when we are nearing 1.0.0 and the language is stable, we'll have a comprehensive std lib audit. Everything gets fully documented. Many APIs are removed from std. Volunteers are welcome to take ownership of the removed code in separate repositories. APIs that remain are made to be consistent with each other in naming, style, and patterns. The overarching goal here is going to be for std to be as minimal as possible, except where such minimalism would harm the community or introduce dependency issues. Everyone should be happy with the zig package manager before we do this audit.