Zig: Wasm u8 char

Created on 25 Dec 2019 · 6Comments · Source: ziglang/zig

I have been searching for months and asked on discord, it's pretty simple what I am looking to do.

Given I have compiled a simple Wasm binary that exports a function stream that ideally takes a JavaScript Uint8Array or bytes are looped and passed individually through to my zig function. How can I convert the Utf8 bytes to chars? Is there a zig STD lib func to do string from Utf8 u8?

I am writing a parser and all I want for Christmas is an answer to this.

Edit: for clarification:

In JavaScript, if I use TextEncoder to encode the charcodes of a string into Utf8, how do I get this into zig as a decoded string.

https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder

question

Source

adam-cyclones

Most helpful comment

Simply, I am not confident enough to write a Lexer Parser that deals with just bytes although I sort of see how you could map bytes to tokens. I would have thought it would be simpler for a novice like me to just deal with chars.

chars are bytes... do you mean unicode codepoints?
And generally no: working at the codepoint is not easier: you often want to "read until delimiter". See also std.mem.tokenize

daurnimator on 26 Dec 2019

👍2

All 6 comments

How can I convert the Utf8 bytes to chars?

utf8 bytes already are chars? Your question is not clear.

daurnimator on 26 Dec 2019

@daurnimator I have updated the question for clarification. Thank you for such a fast reply.

adam-cyclones on 26 Dec 2019

Why do you need to decode the bytes?

If that's really what you need, have a look at std.unicode.

andrewrk on 26 Dec 2019

👍1

Hello @andrewrk (big fan of Zig! 100% the easiest language to get working with WASM) I cannot pass strings into WASM modules at this time as only integer types can be passed into exported functions.

https://github.com/WebAssembly/interface-types/blob/master/proposals/interface-types/Explainer.md

When this proposal drops it may be possible to support higher level types passed in to a module, similar to how bindgen for rust and embind for Emscripten c++ work. except this layer will be wrapping a WASM module instead of packed as some js gluecode.

As for why do I need to decode the bytes at all, interesting question.
Simply, I am not confident enough to write a Lexer Parser that deals with just bytes although I sort of see how you could map bytes to tokens. I would have thought it would be simpler for a novice like me to just deal with chars.

Anyway thank you for the hint, (I found that the docs are still lacking from a beginers standpoint).

adam-cyclones on 26 Dec 2019

Simply, I am not confident enough to write a Lexer Parser that deals with just bytes although I sort of see how you could map bytes to tokens. I would have thought it would be simpler for a novice like me to just deal with chars.

chars are bytes... do you mean unicode codepoints?
And generally no: working at the codepoint is not easier: you often want to "read until delimiter". See also std.mem.tokenize

daurnimator on 26 Dec 2019

👍2

Hi @adam-cyclones, welcome to the community.

Simply, I am not confident enough to write a Lexer Parser that deals with just bytes although I sort of see how you could map bytes to tokens. I would have thought it would be simpler for a novice like me to just deal with chars.

It may sound counter-intuitive, but my advice to a novice would be to have your zig code accept UTF-8 encoded data, and never decode it. UTF-8 is a brilliantly designed data format; you can do most operations you need to on it without decoding. For example you can look for the byte ' ' (space) to separate tokens, and that will work, and be correct with respect to Unicode.

I'm going to close this issue since there's nothing to do here to solve it, but please feel free to continue the discussion in one of the community gathering places.

andrewrk on 31 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

zig build: default output dir

zimmi · 3Comments

zig fmt deletes comments starting with empty comment

fengb · 3Comments

QOL Proposal: use(xxx) statement to reduce repeating element

bheads · 3Comments

remove octal and binary float literals from the language

andrewrk · 3Comments

replace "&&" and "||" with "and" and "or"

andrewrk · 3Comments