Emscripten: Module.print lacking newline characters

Created on 28 May 2016 · 17Comments · Source: emscripten-core/emscripten

It seems that Module.print is passed a string without newlines (and potentially some other special characters) that were present in the program's output, instead being called once for each "line". This means that it's not possible to differentiate between the case of a string with a trailing newline being printed, and the case of a string without a trailing newline being flushed to the buffer. This is problematic for use cases in which the exact output of the program matters (as is the case in the project I'm working on). As a "test case" of sorts to demonstrate this, a single printf in main without a trailing newline in the format string will print with a trailing newline in emscripten's output.

Is there a more refined version of Module.print which gets such raw information, essentially capturing the exact stdout stream of the program somewhat similarly to Module.stdin?

wontfix

Source

joesavage

Most helpful comment

And issue is also that there are programs that print to stdout using \r for displaying progress like information. The stdout will be updated but no new line will be generated and because of this it will be impossible to capture and interpret progress.

videoconverter.js is such a project and ffmpeg prints progress information using this method above but because newline is waited until a new Module.print is called, the progress information will be available only after completion so impossible to be useful.

@kripken - the proposal with rawPrint still sounds good but it should be called on any change, not on newlines only to be useful.

iongion on 11 Aug 2016

👍2

All 17 comments

Oops -- it seems that, in fact, one cannot flush to stdout without a newline (as outlined in #2770), thus the situation in the comment above cannot actually occur in practice. Is there any way to get around this? Differentiating between the two cases outlined in my previous comment is actually very important to my use case, and as I use a custom print implementation my output doesn't have to be line-based like it might for a simple console.log.

EDIT: It seems that this is a far more trivial change than I imagined and that changing the put_char function in library_tty.js of both default_tty_ops and default_tty1_ops to something of the following general form works great:

if (val) tty.output.push(val);
Module['print'](UTF8ArrayToString(tty.output, 0));
tty.output = [];

While I was expecting the above to actually print too often (flushing the buffer on every character), it seems that due to the way input is handled internally it works exactly as I would like (flushing on fflush or a newline character). For the project as a whole, perhaps it's worth providing this as an option for those of us who are using non-line-based output mechanisms? If you think that would be valuable I would be more than happy to create a pull request.

joesavage on 28 May 2016

The underlying issue is that console.log and other JS printing mechanisms are all like puts, they are a full line with a new line at the end. So we buffer until we see a newline, then send it (without the newline) to be printed. Not sure what else we can do here?

kripken on 31 May 2016

Yeah, I understand why this would be necessary for the default output mechanism for sure. Some users (such as myself) who specify custom print implementations that are not constrained in this way, however, may find it useful to be able to get access to the real raw data being output.

joesavage on 1 Jun 2016

👍1

Do you mean the raw, unbuffered stream of bytes? I'm ok with adding an option for that. Maybe module.rawPrint and module.rawPrintErr?

kripken on 1 Jun 2016

I was actually thinking more about a buffered stream that just included newline characters, but a raw, unbuffered stream would do mostly the same job (and is something I could see being useful for other purposes too).

The problem with a fully unbuffered output is that for users (such as myself) trying to create something which behaves almost identically to native output channels, the impact of functions like fflush won't be detectable (hence the changes I've made on my local copy to fix this simply result in a buffered channel that gets flushed on newlines or fflush).

joesavage on 1 Jun 2016

How would a buffered stream with newlines work?

kripken on 1 Jun 2016

Newline characters are simply included in (rather than excluded from) the tty.output array (a trivial change), but print only gets called on encountering a newline or fflush call (like usual, and unlike an unbuffered stream). This actually seems to be the functionality I have achieved by making the changes detailed here (much to my surprise, as outlined in the comment, as the immediate logic of the code doesn't read that way to me).

joesavage on 1 Jun 2016

if (val) tty.output.push(val);
Module['print'](UTF8ArrayToString(tty.output, 0));
tty.output = [];

Hmm, if val then it pushes, then it prints, and then the array is cleared. So the array is at most of length 1? I.e. this will call print with an empty array, or with an array of size 1?

kripken on 1 Jun 2016

It seems that way, but that's not how the code behaves on my machine (though, as an aside, that whole block should probably be wrapped in the if, oops). Instead, print is called as I described, only after a newline or fflush. I have no idea why this is, but it seems to work for me.

joesavage on 1 Jun 2016

Very strange. Maybe some other change you made somewhere else? If that code is all there is, it seems the array must be size 0 or 1...

kripken on 2 Jun 2016

It looks like those are the only changes I've made, perhaps this is the result of some odd characteristic of my specific codebase though (I haven't tried any simpler test cases). Regardless, I imagine such behaviour should be implementable using more sane logic.

EDIT: FWIW, I'm finding that with -s ASYNCIFY=1 the behaviour I was observing holds somewhat for simple test cases, and isn't equivalent to a simple unbuffered stream. Particularly, the stream is flushed on newlines, though in my test cases doesn't seem to be effected by fflush. From this I'm guessing that the behaviour I was observing is due to the specifics of the way this async mechanism works. As I said though, the details of my crumby implementation shouldn't matter too much anyway.

joesavage on 2 Jun 2016

@kripken - the proposal with rawPrint still sounds good but it should be called on any change, not on newlines only to be useful.

iongion on 11 Aug 2016

👍2

Coming back to this issue, it seems like rawPrint is definitely a good solution, allowing the application to do buffering however it wishes. Then, to accompany this — and allow applications to buffer text just like native programs — it would be great to let the user define a custom fflush hook on the Module such that this can influence (e.g. flush, in the common case) their custom buffer.

It seems like the only slightly difficult part here might be actually getting the flush functionality to work properly with the emterpreter / “asyncify” functionality. At least, this is the part I've been having trouble with in my basic efforts (hence my really nutty temporary workaround that doesn't always work properly).

joesavage on 5 Nov 2016

I think rawPrint would probably also be helpful for this issue https://github.com/kripken/emscripten/issues/4437

Or maybe exposing putChar on Module?

rajsite on 2 Dec 2016

👍1

Hi, just encountered this myself ... the reason why @joesavage 's code does not get called for every character is simply because line buffering is handled further up in libc. So TTY->write is only ever called when a linefeed is output, or when fflush is called (assuming you're using buffered stdout functions of course).

FWIW I'm working on getting synchronous stdin/stdout working using pthreads and xterm.js. It's almost working, but the code's a bit rough around the edges still. The part pertinent to this issue is to put Joe's code at the top of put_char like this ...

put_char: function(tty, val) {
  if (Module['put_char'] != 'undefined') {
    if (val != 0) tty.output.push(val);
    Module['put_char'](UTF8ArrayToString(tty.output, 0));
    tty.output = [];
    return;
  }
  [...]

Where Module.put_char is then simply calling xterm.js's write function.

eclecticdave on 11 Dec 2017

Ok, so what I said above wasn't entirely correct ...

While TTY->write is only called when the buffer contains a whole line, it does then enter a loop and call put_char for each character to be printed. The current implementation then proceeds to reassemble the line before calling Module.print ... which seems a little inefficient. Maybe a function could be added to stream_ops which takes a whole line at a time?

eclecticdave on 18 Dec 2017

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.