Iris: lost connection to ws

Created on 20 Feb 2019  ·  7Comments  ·  Source: kataras/iris

there I have experience lost connection to ws handler it has nothing to return when after a test configurations.

app := iris.New()
ws_core := websocket.New(websocket.Config{
        ReadBufferSize:    1024,
        WriteBufferSize:   1024,
        EnableCompression: true,
        PingPeriod:        10 * time.Second,
        BinaryMessages:    false,
    })
    //TestSub(app, ws_core)
    ws_handle_external_logic(app, ws_core)
    app.Get("/ws", ws_core.Handler())

Most helpful comment

Thanks @jjhesk, I am working on all these and more as we speak. I let you some comments on your PR as well, you have opened a PR that contains a bunch of things that could destroy Iris, i.e import a package that works only on Linux and the caller will deadlock on other platforms like windows... please try to cleanup so I can review your hot changes.

The onEventListeners does not exist in our codebase, I can guess so. Please keep one post issue for the problem you are facing, it is not nice neither polite to repeat things more than once, community will be tired to follow-up. BTW, did you try it against the #1175 (v11.2)?

In short, there is no need to spamm code snippets that makes no sense to the rest of the github community. You can always join to the iris chat or even better to the websocket room so we can communicate more directly. Currently I am using something else as a test suite than yourself as I can see; if I have a stresstest that could work for you in my hands I can solve it pretty quickly. I really want to help you as much as I can, Iris will be benefited after all:)

Keep the energy up!

All 7 comments

lost in the handler that doesnt trigger next().. any documentation for handler_execution_rules? how does handler queue works? I have done deep research in the codebase and still have no clue where to trigger the registered handles from Get.

Any further information on
ctx.Do(n.Handlers) from source

edit:
later I found out that the location is at here and here

There is a three-way handshake to open a TCP/IP connection, and a four-way handshake to close it. However, once the connection has been established, if neither side sends any data, then no packets are sent over the connection. TCP is an “idle” protocol, happy to assume that the connection is active until proven otherwise.

TCP was designed this way for resiliency and efficiency. This design enables a graceful recovery from unplugged network cables and router crashes. e.g., a client may connect to a server, an intermediate router may be rebooted, and after the router comes back up, the original connection still exists (this is true unless data is sent across the connection while the router was down). This design is also efficient, since no “polling” packets are sent across the network just to check if the connection is still OK (reduces unnecessary network traffic).

TCP does have acknowledgments for data, so when one side sends data to the other side, it will receive an acknowledgment if the connection is stil active (or an error if it is not). Thus, broken connections can be detected by sending out data. It is important to note that the act of receiving data is completely passive in TCP; a socket that only reads cannot detect a dropped connection.

This leads to a scenario known as a “half-open connection”. At any given point in most protocols, one side is expected to send a message and the other side is expecting to receive it. Consider what happens if an intermediate router is suddenly rebooted at that point: the receiving side will continue waiting for the message to arrive; the sending side will send its data, and receive an error indicating the connection was lost. Since broken connections can only be detected by sending data, the receiving side will wait forever. This scenario is called a “half-open connection” because one side realizes the connection was lost but the other side believes it is still active.

I have tested that library doesnt cater half-open detection.
http://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html

i have went over the src and discovered that there are a bunch of racing the on event loop and emit message loop. I have fixed some part of them and still searching of the break point.

some wip in the connection.go

func (c *connection) On(event string, cb MessageFunc) {
    var present bool
    c.onEventListenersMu.RLock()
    println("📪 锁 R onEventListenersMu", c.id)
    if _, present = c.onEventListeners[event]; !present {
        c.onEventListenersMu.RUnlock()
        println("📭 开锁 R onEventListenersMu", c.id)
        c.onEventListenersMu.Lock()
        println("📪 锁 onEventListenersMu", c.id)
        if _, present = c.onEventListeners[event]; !present {
            // Insert the newly created *memorySource.
            c.onEventListeners[event] = make([]MessageFunc, 0)
            c.onEventListeners[event] = append(c.onEventListeners[event], cb)
        }
        c.onEventListenersMu.Unlock()
        println("📭 开锁 onEventListenersMu", c.id)
    } else {
        c.onEventListenersMu.RUnlock()
        println("📭 开锁 R onEventListenersMu", c.id)
        c.onEventListenersMu.Lock()
        println("📪 锁 onEventListenersMu", c.id)
        c.onEventListeners[event] = append(c.onEventListeners[event], cb)
        c.onEventListenersMu.Unlock()
        println("📭 开锁 onEventListenersMu", c.id)
    }
    /*
    if c.onEventListeners[event] == nil {
        c.onEventListeners[event] = make([]MessageFunc, 0)
    }

    c.onEventListeners[event] = append(c.onEventListeners[event], cb)*/
}

broken pipe error shortly after a client disconnects.
After a client disconnects by calling c.Disconnect() from webscoket2, the server appears to keep the connection, as about a minute later there is a broken pipe error. This happens from the line connected from overseas server but it doesnt happen from another inland server. and Happens from the client side.

Thanks @jjhesk, I am working on all these and more as we speak. I let you some comments on your PR as well, you have opened a PR that contains a bunch of things that could destroy Iris, i.e import a package that works only on Linux and the caller will deadlock on other platforms like windows... please try to cleanup so I can review your hot changes.

The onEventListeners does not exist in our codebase, I can guess so. Please keep one post issue for the problem you are facing, it is not nice neither polite to repeat things more than once, community will be tired to follow-up. BTW, did you try it against the #1175 (v11.2)?

In short, there is no need to spamm code snippets that makes no sense to the rest of the github community. You can always join to the iris chat or even better to the websocket room so we can communicate more directly. Currently I am using something else as a test suite than yourself as I can see; if I have a stresstest that could work for you in my hands I can solve it pretty quickly. I really want to help you as much as I can, Iris will be benefited after all:)

Keep the energy up!

hi, @kataras good news. I just fixed the websocket code and i have confirmed the killer is the buffer library it kills the entire app. the buffer library as the foundation and it makes alot of noise to the rest of process. alot of data race is raised and caused from this point and it spread out quickly to the message receive and send functions because of the corrupted data intake. so I did some readings from https://www.youtube.com/watch?v=N3PWzBeLX2M

After a series of line attack battles, I finally managed to confirmed the killer location and switched out to employ another buffer library. I know that lib work fast but it breaks to entire app so there are no point to use anymore until we find the fix or better performance lib. For now I use the stay sync.Pool as the conversion and it slower but it never breaks anymore.

about the PR from #1195 if that package ws1m only works on Linux then we can issue caveat for the develop to check if they are using linux or add that checker on the code.. but i am not sure how to do it as for now. i updated and read your comment already and I have to leave it to you for now as I am rushing to optimize that system i have sent you in email. I will come back later to check on your progress after that. Thanks I have noticed that I maybe too noisy at some point because I was in high pressure to find the cure solution.

onEventListeners was added because I was suspecting the problems coming back from another data race detections that might hit directly to the listener parts so i added that for now. I might remove that in the future for perf as it doesnt break anything for now.

@kataras kind regards,

Was this page helpful?
0 / 5 - 0 ratings