Is this just a technical limitation of streams itself, or is it just that it hasn't been implemented?
Read the docs: https://github.com/sindresorhus/got#retry
Note: When using streams, this option is ignored. If the connection is reset when downloading, you need to catch the error and clear the file you were writing into to prevent duplicated content.
@szmarczak I got this question when reading that very paragraph in the docs. Is this just a technical limitation of streams itself, or is it just that it hasn't been implemented?
I very this yet again:
If the connection is reset when downloading, you need to catch the error and clear the file you were writing into to prevent duplicated content.
Please read the Node.js docs about streaming. Here's the simplest example I can think of. Imagine you're downloading abc and processing every letter takes one second. This happens in a flawless scenario.
a
// 1 second passes
b
// 1 second passes
c
But this one:
a
// connection drops, a new one is created
a
// 1 second passes
b
// 1 second passes
c
results in aabc, which is not correct. You cannot reuse the same stream because you get duplicated data as mentioned above. So you reset the file you were piping to and create a new Got stream.
@szmarczak thanks! I don't see why reusing the same stream couldn't be a valid way to do it, though. Imagine this scenario, in which you're writing everything that comes from a stream to memory (say, an in-memory array):
a
// connection drops, user receives an "error" message and then resets whatever they were outputting to
// a new connection is created
a
// 1 second passes
b
// 1 second passes
c
In this case, once the user receives an "error" message from the stream, they can go ahead and clear the in-memory array, and then their final in-memory array will still properly have a b c.
I'm asking because in my case, I ended up implementing my own retry mechanism with exponential backoff that wrapped the creation of the got stream, but it would be nice if I could use got's built-in retry options instead. Do you think it's possible / advisable (technically) for got to have this kind of behavior in which after an "error" message is received, it just retries and continues sending data?
Emitting an error w/o destroying the stream is incorrect behavior. After all we could (but do not at the moment) just check if the user attached a reset event handler. In that case we can just emit a new stream. Reusing is not a good idea because we would need to reset the downloadProgress & uploadProgress, which would be very confusing and people could think of that as a bug.
Not to mention that you can pipe to the Got stream. In this case retrying should be disabled too. Not sure what's the current behavior though, will check.
It's hard for me to see the bigger picture. What's your use case here?
Yeah, that makes sense why we shouldn't reuse the stream.
The bigger picture is that I have a piece of code that creates a got stream (https://github.com/cisagov/crossfeed/blob/retries/backend/src/tasks/censysIpv4.ts#L36), but there's no easy way to use got's built-in retry mechanism for when the connection ends while in the middle of the stream.
Instead, I had to implement exponential backoff with a separate library, p-retry (see https://github.com/cisagov/crossfeed/blob/retries/backend/src/tasks/censysIpv4.ts#L153-L166).
It would be ideal if there were a feature in got that would not require me to use p-retry just for this. Perhaps the reset solution you suggested might work -- or do you have any other thoughts?
The bigger picture is that I have a piece of code that creates a got stream (https://github.com/cisagov/crossfeed/blob/retries/backend/src/tasks/censysIpv4.ts#L36), but there's no easy way to use got's built-in retry mechanism for when the connection ends while in the middle of the stream.
Yeah but why did you decide to go with streams and not the promise API?
Mainly because I'm downloading large files (hundreds of megabytes) and need to filter out and keep only a small fraction of lines in each file. It seemed more memory-efficient to use streams to get the lines I needed rather than loading the entire response into memory.
Do you think promises would be better to use in this case though?
Do you think promises would be better to use in this case though?
Definitely no. IMO the proper solution would be to emit a retry event with another Got stream. If you have any other ideas, feel free to tell me :)