Zig: What data structures should belong in std?

Created on 14 Apr 2018 · 20Comments · Source: ziglang/zig

Fake motivation:

Associative array data structures like some skiplists, B-trees, and radix/compact prefix trees need less to no randomness to get similar speed with less memory overhead. Ignoring compression, lower memory use means:

A friend's computer needs less RAM to run my interpreter, game/emulator, or crypto client while their browser/antivirus is running too.
Less granularity is required when doing greater-than-memory computations (faster batch processes)
A higher threshold before my server/router/cache/database hits swap (etc.), ulimits, or OOM.

And there's the problems/bikesheds of hashing/collision strategies and entropy source quality that I'd prefer not to leave to a standard library.

Real motivation:

I recently learned about a better crit-bit tree called a qp trie. Some people have measured speedup over hash maps (more for writes) and much better memory usage per key. There's a few Rust ports.

Caveats:

The qp trie's efficiency is somewhat dependent on the distribution of key prefixes (worst-case is still good) and allocation approach/API. This paper's approach to a trie may help.
The author's claims about a DNS organization using it in production show similar(?) results compared to hash array tries, though.

question

Source

0joshuaolson1

Most helpful comment

Yep. To further clarify:

Nearly anything that is generally useful, and has minimal or no OS dependencies is welcome into the stdlib pre-1.0.0. This helps act as a test bed for the language, and helps give us real APIs to get working with each other as we work on the language and standard library.

Then, when we are nearing 1.0.0 and the language is stable, we'll have a comprehensive std lib audit. Everything gets fully documented. Many APIs are removed from std. Volunteers are welcome to take ownership of the removed code in separate repositories. APIs that remain are made to be consistent with each other in naming, style, and patterns. The overarching goal here is going to be for std to be as minimal as possible, except where such minimalism would harm the community or introduce dependency issues. Everyone should be happy with the zig package manager before we do this audit.

andrewrk on 7 Aug 2018

👍6

All 20 comments

I think we should definitely add some more data structures in general, as right now we only have the following:

ArrayList
HashMap
LinkedList

Some extra things that may be useful to include.

BTree
Cuckoo Filter/Bloom Filter
Priority Queue
Fixed Ring Buffer (embedded)

Given that the stdlib will probably be heavily revised before 1.0 we can always experiment and figure out what works now with no huge worry. It would be important to clearly document the tradeoffs for each data structure to the user if there is potential overlap.

Aside: I have read that argument on SipHash. It seemed fairly flimsy though since if you have sufficient access to read the secret key from process memory, I think you have bigger issues/considerations.

tiehuis on 15 Apr 2018

Just want to note that boost has shown that circular buffers (or fixed ring buffers) are amazingly powerful for queues and stacks providing almost vector/arrayList like speed rather than the often 10x slower deque speed that they commonly are implemented with.

BraedonWooding on 15 Apr 2018

Practical restriction would be to allow only fully documented data structures.

Simple types like 2D and 3D point should be added, so people are not tempted to create their own.

PavelVozenilek on 16 Apr 2018

Simple types like 2D and 3D point should be added, so people are not tempted to create their own.

I disagree. Aren't they mostly for graphics, windows, games, and scientific computing? And people should be tempted to make their own in this case - a Point is like the simplest struct ever and has no clear associated functions to make a real module out of.

0joshuaolson1 on 16 Apr 2018

@tiehuis

that argument on SipHash. It seemed fairly flimsy

I don't personally care about SipHash (except its speed when good mixing isn't needed), but I do care that a server using a probabilistic data structure like a hashmap has to deal with (adversarial) worst case bucket balancing, blocking/crappy OS RNGs, etc.

Constant time operations are a whole 'nother matter, though.

0joshuaolson1 on 16 Apr 2018

Speaking of crypto, what about HTTP/2, encoding/decoding, regex, and command line parsing? Certain data structures may be necessary just to implement them, but on the other hand nontrivial algorithms exposed through the standard library increases the maintenance burden of Zig's std due to something as simple as discussing API etc. tradeoffs in issue trackers.

0joshuaolson1 on 16 Apr 2018

@0joshuaolson1

Simple types available in standard library may result in compatible libraries. In C every graphics library (like Cairo) got their point and rectangle. Similar mess happened with boolean type in C, and (until stdint.h) with sized integers.

PavelVozenilek on 16 Apr 2018

I hope Zig's comptime can overcome C/C++/Rust's problem where people need to reimplement parts of the standard library with macros/templates.

0joshuaolson1 on 16 Apr 2018

Little list of what I think should be in Zig in terms of data structures;

Pt2D/Pt3D
Vec2D/Vec3D
Complex Numbers (even C has it); 'technically' a data structure
Quaternions in all their glory, while Quaternions are complex numbers (or 4 tuples if you prefer) the operations on them are often specific to Quaternions and are often around rots, especially around conversion to angles or a rot like structure for games.
non-probabilistic hashmap
queues/stacks using a circular buffer, deque is just too damn slow, like a factor of up to 100
BigInt (I guess)
Decimal is a huge one too
Matrices are beautiful too

Just felt I should add my two cents

BraedonWooding on 16 Apr 2018

@BraedonWooding

BigInt (I guess)

Here is detailed tutorial (consisting of 10 parts) how to implement multiple precision arithmetic, usable even for cryptographic applications.

PavelVozenilek on 17 Apr 2018

Here is a quick vector thing I drafted up when I had a few hrs;
https://github.com/BraedonWooding/zig/blob/Vec/std/math/vector.zig
To me 300 lines is still like a little small to completely warrant a standard library entry, however I think that the whole keep it consistent business is important :).

BraedonWooding on 23 Apr 2018

"What data structures should belong in std?"
IMHO none. Why? Because the idea behind Zig is to be a more ergonomic/safe C or actually a more ergonomic/safe/portable assembly. The data structures should belong to a library.

I love data structures, kd-trees R-trees and friends among others. Why not create a project outside Zig that has everything dreamed of in various modules, geometric data structures, probabilistic data structures, string processing data structures.

fithisux on 24 Apr 2018

👍1

@fithisux: projects outside would be suspected as being abandoned or obsolete. Being part of the distribution gives some assurance that the bloody thing may actually work.

PavelVozenilek on 25 Apr 2018

Red-black trees were just merged into master.

0joshuaolson1 on 7 Aug 2018

Yep. To further clarify:

andrewrk on 7 Aug 2018

👍6

As a language user I prefer a usable and clean standard library to a featureful and spaghettified one. I find it unlikely that things introduced will later be removed once people depend on them as the language grows.

Consider the enormous size of the Java standard library for instance.

binary132 on 9 Aug 2018

I find it unlikely that things introduced will later be removed once people depend on them as the language grows.

If it's difficult to update imports when an std package is moved out, that's a language/tooling problem.

EDIT: And if imports worked a little more like IPFS (for example), nobody would even notice (besides maybe network fetching with version upgrades).

0joshuaolson1 on 10 Aug 2018

First of all, you do not have control over imported packages. I cannot force someone to update their library to stop using _someone else's_ library which uses deprecated stdlib features. Difficulty with deprecation is a very real problem.

As to API stability, let's not forget the impact of Rust's API stability issues on its community.

I'm not clear on your IPFS comparison. Are you suggesting that deprecated stdlib imports should alias to remotely resolved packages at build time?

binary132 on 10 Aug 2018

Libraries shipped with the compiler are expected to be usable. Random code somewhere on the internet is seen as a crap to be avoided.

What I'd like as the ideal solution:

stdlib contains plenty of (properly maintained) libraries.
Project explicitly specifies which libraries from stdlib it wants to use.
These selected libraries (and associated documentation and tests) will be automatically copied inside project file structure.
If a library offers optional features, then it should be possible possible to omit any such feature. When the library (and docs) is copied, this skipped feature is removed from the source code copy (and the docs).

With this the programmer won't be overloaded and would have a chance to customize stdlib code. Proposed band-aid features like struct extensions (#1170) won't be needed.

PavelVozenilek on 10 Aug 2018

you do not have control over imported packages.

Neither can you control which Zig versions or platforms libraries are written for, but they're not always crap as PavelVozenilek put it. The problems of std API changes and library writers changing things are similar.

I'm not clear on your IPFS comparison. Are you suggesting that deprecated stdlib imports should alias to remotely resolved packages at build time?

I'd imagine package management can be done separately from building.

0joshuaolson1 on 11 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Pick one: ziglang or zig-lang

jayschwa · 3Comments

question: async/await and threadpools for fs/cpu work

jorangreef · 3Comments

C ABI test failures on Windows x86_64

andrewrk · 3Comments

doc: Document const Semantics

bronze1man · 3Comments

replace "&&" and "||" with "and" and "or"

andrewrk · 3Comments