Aspnetcore: "Writing is not allowed after writer was completed", with SignalR 1.1

Created on 15 Jan 2019 · 26Comments · Source: dotnet/aspnetcore

Describe the bug

I'm receiving "Writing is not allowed after writer was completed" on the latest release of SignalR with .NET Core 2.2. We previously used 1.0.0-rc1-final without any issues, and all of a sudden started receiving this issue with 1.1 on the client. We also receive a lot of "Task was cancelled" from the RedisHubLifetimeManager. Attaching screenshots below.

We ran SignalR 1.1 on the backend with 1.0.0-rc1-final on the client without any issues, but when we upgraded our client to 1.1, we started receiving this in our logs.

Another change we did while upgrading was to add "skipNegotiate: true" on the client, as well as forcing websockets. Not sure if this is related. I can't reproduce the error locally, since it doesn't happen often and seems related to only a few of our clients.

Also, the server seems to go down quite often after the upgrade. Can't seem to find any reason for this, but it was stable before we upped our clients.

To Reproduce

Can't reproduce the error locally, we just see it in our production environment logs.

Expected behavior

Expected it to be gone with 1.1.0, since I saw previous reports of the problem on RC versions.

Screenshots

Screenshot of writing is not allowed message:

Screenshot of task cancelled exception:

Additional context

dotnet --info output:

This is on our production environment. Locally we have the 2.2 runtime.

EDIT: If this issue is misplaced or inproper, feel free to close it. I didn't know where to turn.

accepted area-signalr bug

Source

viktorgullmark

👍2

Most helpful comment

This is a benign error as it looks like the client is already disconnected.

We'll look at making this either not try to write after the client is gone, or at a minimum not be an error log.

BrennanConroy on 13 Feb 2019

👍6

All 26 comments

The task canceled exception from Redis occurs when you try to add or remove a connection from a group where the connection either doesn't exist, or is on another server and the other server took too long to respond.

As for the "Writing not allowed" exception, how often are you seeing this? Do the server logs show that a client was disconnected around the same time the exception occurs?

BrennanConroy on 15 Jan 2019

Thanks for the input. We've resolved the Task canceled exception now thanks to your clarification.

We're seeing the "Writing is not allowed after writer was completed" a few times every hour, with around 100 users online. Can't find a disconnect at this timestamp in the logs though 🤔

viktorgullmark on 17 Jan 2019

Could you capture logs around the exception, at Debug level preferably.

BrennanConroy on 29 Jan 2019

Closing this as we haven't heard from you. Please feel free to comment if you're able to get the information we're looking for and we can reopen the issue if we're able to identify the problem.

bradygaster on 6 Feb 2019

Hi,
I'm getting the same error with this scenario:

.NET Core 2.2, SignalR 1.1, scaling using Redis (2.x) and hosting into Docker.

Unfortunately, I'm unable to reproduce the problem and the log isn't so helpful (Writing is not allowed after writer was completed)
Is there anything I can do to provide more information? (add more logging somewhere, try another build or whatever?).

I'm also using Groups so, probably is the same problem of @viktorgullmark but I don't have the issue related to the Task Canceled exception.

Let me know.

imperugo on 6 Feb 2019

We have the same errors, only in production inside Docker behind Nginx.
The messages appeared after upgrade to .NET Core 2.2.

It doesn't matter if we use SignalR 1.1 or 1.0 on the Client Side.

By the way, we don't use Groups or Redis.

codingsteff on 7 Feb 2019

@imperugo Enabling Debug level logs would be nice. I'd like to see the logs around the failure if possible, as they can give context on what is happening on the server.

@codingsteff Can you confirm that switching from SignalR 1.0.0 to 1.1.0 on the server causes the issue? And could you also collect logs please?

BrennanConroy on 7 Feb 2019

@BrennanConroy I'll switch the log only for "Microsoft.AspNetCore.SignalR.*" to debug.

Soon news

imperugo on 8 Feb 2019

I rolled back our server to .NET Core 2.1 and now we have no errors in the log anymore.
Currently all our clients using JavaScript SignalR 1.1.0, the server runs with .Net Core 2.1.1 and Microsoft.AspNetCore.SignalR 1.0.3.

Next week I can switch the production to Microsoft.AspNetCore.SignalR 1.1.0, but because of the dependency to Microsoft.AspNetCore.Connections.Abstractions I need to switch to .NET Core 2.2 as well.

I'll send you some log entries.

codingsteff on 8 Feb 2019

@BrennanConroy here my logs

screenshot 2019-02-08 at 14 05 09

imperugo on 8 Feb 2019

@BrennanConroy here are the logs from the same ConnectionId

And another one

Too many clients, I'm not sure if it's the right ConnectionId.

codingsteff on 13 Feb 2019

We have the same errors here in two applications.

One is running on .NET Core 2.2.1 (deployed as self-contained). The other is running on .NET 4.7.1.

They are built using the SDK 2.2.103 and runtime 2.2.1.

The Microsoft.AspNetCore.SignalR package version is 1.1.0.
The @aspnet/signalr package version is 1.0.4.

We enabled the SignalR log on our servers, then soon I'll be able to share more data about the connections and errors.

Microsoft.AspNetCore.SignalR.HubConnectionContext -> Failed writing message. Aborting connection.
System.InvalidOperationException: Writing is not allowed after writer was completed.
   at System.IO.Pipelines.ThrowHelper.ThrowInvalidOperationException_NoWritingAllowed()
   at System.IO.Pipelines.Pipe.GetMemory(Int32 sizeHint)
   at System.IO.Pipelines.Pipe.DefaultPipeWriter.GetSpan(Int32 sizeHint)
   at System.IO.Pipelines.PipeWriter.WriteAsync(ReadOnlyMemory`1 source, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.SignalR.HubConnectionContext.<TryWritePingSlowAsync>d__49.MoveNext()

fabiano on 13 Feb 2019

This is a benign error as it looks like the client is already disconnected.

We'll look at making this either not try to write after the client is gone, or at a minimum not be an error log.

BrennanConroy on 13 Feb 2019

👍6

@BrennanConroy Alright, thanks for looking into it!

viktorgullmark on 13 Feb 2019

👍1

Is there any progress on this one?
Can anyone confirm that this type of error can be safely ignored by our log filter(as it constantly fire our alerting system based on Error log level) and does not indicate any issues in a running service?

pashchuk on 26 Mar 2019

Probably I can try to fix this and submit a PR if there is no one in the team already on this issue or you have focused on the more priority tasks.

pashchuk on 26 Mar 2019

This is also affecting us. We have roughly 30k connections on each server and we don't use any backplane between them. Might be a coincidence but i noticed an increase in the number of exceptions of this type when i updated the typescript clients from 1.0.0-rc-final to 1.1.4. (The servers are running dotnet 2.2)

Here is an example of 1s of logs from one of the servers:
ubuntu_ip-172-31-54-19___opt_insurads_log

Also, what would be the your recommendation for handling these exceptions?

rfrancisco on 12 Apr 2019

Might be a coincidence but i noticed an increase in the number of exceptions of this type when i updated the typescript clients from 1.0.0-rc-final to 1.1.4

I reverted the client script to the version compiled with 1.0.0-rc1-final and the exceptions disappeared after an hour or so. Moved back to the version compiled with 1.1.4 and the System.InvalidOperationException: Writing is not allowed after writer was completed. exceptions start to occur.

I did this 3 times and the results were always the same.

I also noticed a small drop in the number of connected users when i use the version compiled with 1.1.4. It's hard to know for sure because the number of connected users vary constantly but whenever i published the version that caused the exceptions i would see a drop in connected users after a couple of minutes. In the last test the connections wend down from 216K to 200K in 10m, then went back up when i published the version with 1.0.0-rc-final. But again, this could just be a coincidence.

rfrancisco on 13 Apr 2019

To confirm, is anyone seeing a functional problem here or is it just a spurious exception message in the logs? I just want to make sure to understand if this is actually causing a functional problem. It looks like it's the writing of the "ping" message that's failing, which is fine (though annoying to see in the logs).

We'll look at ensuring the pings are stopped before the pipeline is closed in preview 6.

anurse on 30 Apr 2019

I believe that there is a functional problem. We saw a reduction of connected users, that was the main reason we were forced to revert back to the version compiled with 1.0.0-rc-final.

rfrancisco on 30 Apr 2019

👍1

Hrm, I don't think that would be directly related to the callstack you're seeing though. The ping logic is very carefully designed to be resiliant to failures like this. In fact, if that log message is written it means the exception is not bubbling up, since we catch it and don't re-throw it: https://github.com/aspnet/AspNetCore/blob/0b7458cc984c0c0f9c702f9831c1815a365c07ac/src/SignalR/server/Core/src/HubConnectionContext.cs#L305-L308

The log message is unfortunately unclear about this. It says "Failed writing message. Aborting connection.". This is because we share the same log message in multiple places and in the other places it occurs it causes the connection to terminate. In this specific call stack though, the connection is not terminated. We should fix that message when we make this change (cc @halter73).

The "Failed Connection Handshake" messages (referenced in your other issue https://github.com/aspnet/AspNetCore/issues/9327) are more likely to indicate a problem.

anurse on 1 May 2019

So the message is confusing and unclear, can it please be changed ?

kofifus on 7 May 2019

Yes. that is part of what we're doing.

BrennanConroy on 7 May 2019

Acceptance checklist (check one item)

[ ] We decided not to take this fix.
[ ] The fix is tests-only.
[x] The fix contains product changes (check all items below).
- [x] Relevant XML documentation comments for new public APIs are present.
- [x] Narrative docs (docs.microsoft.com) are updated. (check one item below)
  - [ ] The change requires a new article. An issue with an outline has been filed here: [ISSUE LINK]
  - [ ] The change requires a change to an existing article. A closed docs PR with these changes is linked here: [PR LINK]
  - [x] The change requires no docs changes.
- [x] Verification has been completed. (check one item below)
  - [x] The change is in the shared framework and was verified against the following versions
    - SDK installer: [3.0.100-preview6-012096]
    - ASP.NET Core Runtime: [3.0.0-preview6-19277-03]
    - .NET Core Runtime: [3.0.0-preview6-27727-02]
  - [ ] The change is in an OOB NuGet/NPM/JAR package and was verified against the following version of that package: [PACKAGE ID] [VERSION]

BrennanConroy on 28 May 2019

Hi all, same problem.

This is a benign error as it looks like the client is already disconnected.

No, the client is not disconnected at the time of the error. In our case the client is still waiting for the response, and disconnects after timeout.

Here is exception from the client:

"lg-exception": "System.TimeoutException: Server timeout (30000.00ms) elapsed without receiving a message from the server.",
"lg-callsite": "Microsoft.AspNetCore.SignalR.Client.HubConnection+d__62.MoveNext:0",