Aws-sdk-ios: AWSIoT reconnecting issues

Created on 19 Jun 2018  路  19Comments  路  Source: aws-amplify/aws-sdk-ios

To help us solve your problem better, please answer the following list of questions.

  • What service are you using?

AWSIoT

  • In what version of SDK are you facing the problem?

2.6.21

  • Is the issue limited to Simulators / Actual Devices?

Actual devices

  • Can your problem be resolved if you bump to a higher version of SDK?

No

  • Is this problem related to specific iOS version?

No

  • How are you consuming the SDK? CocoaPods / Carthage / Prebuilt frameworks?

CocoaPods

  • Can you give us steps to reproduce with a minimal, complete, and verifiable example? Please include any specific network conditions that might be required to reproduce the problem.

After reconnecting from an offline state I will occasionally see an error "WebSocket closed with code:1001 with reason:Stream end encountered" show up in the logs. When this happens, I receive no notification that the connection status has changed (AWSIoTMQTTStatus == .connected) and new messages that are sent to subscribed topics are not received by the app. I would expect that after this occurs a reconnect would be attempted to restore the connection.

2018-06-18 16:32:15:975 [13892:4724335] [SR] NSStreamEventHasBytesAvailable <__NSCFInputStream: 0x1d4300bd0>
2018-06-18 16:32:15:975 [13892:4724335] [SR] NSStreamEventEndEncountered <__NSCFInputStream: 0x1d4300bd0>
2018-06-18 16:32:15:975 [13892:4724335] [SR] NSStreamEventEndEncountered <__NSCFOutputStream: 0x1d411fe30>
2018-06-18 16:32:15:975 [13892:4724335] [SR] using _delegateDispatchQueue.
2018-06-18 16:32:15:976 [13892:4724338] WebSocket closed with code:1001 with reason:Stream end encountered
2018-06-18 16:32:15:976 [13892:4724840] -[MQTTDecoder stream:handleEvent:] [Line 55] EventCode:2, stream: <__NSCFInputStream: 0x1d011e720>, Thread: <NSThread: 0x1d0878040>{number = 36, name = (null)}
2018-06-18 16:32:15:976 [13892:4724840] -[MQTTDecoder stream:handleEvent:] [Line 55] EventCode:16, stream: <__NSCFInputStream: 0x1d011e720>, Thread: <NSThread: 0x1d0878040>{number = 36, name = (null)}
2018-06-18 16:32:15:976 [13892:4724840] -[MQTTSession decoder:handleEvent:] [Line 331] eventCode:1
2018-06-18 16:32:15:976 [13892:4724840] MQTT session error, code: 2
2018-06-18 16:32:15:976 [13892:4724840] closing encoder stream.
2018-06-18 16:32:15:976 [13892:4724840] closing decoder stream.
2018-06-18 16:32:16:059 [13892:4724674] ClockTick:24: nothing to republish
2018-06-18 16:32:16:981 [13892:4724840] MQTTSessionDelegate handleEvent: 2
2018-06-18 16:32:16:981 [13892:4724840] MQTT session closed.
2018-06-18 16:32:17:059 [13892:4724674] ClockTick:25: nothing to republish
2018-06-18 16:32:18:055 [13892:4724674] ClockTick:26: nothing to republish
2018-06-18 16:32:19:059 [13892:4724674] ClockTick:27: nothing to republish
2018-06-18 16:32:20:059 [13892:4724674] ClockTick:28: nothing to republish
2018-06-18 16:32:21:058 [13892:4724674] ClockTick:29: nothing to republish
2018-06-18 16:32:22:055 [13892:4724674] ClockTick:30: nothing to republish
2018-06-18 16:32:23:058 [13892:4724674] ClockTick:31: nothing to republish
2018-06-18 16:32:24:056 [13892:4724674] ClockTick:32: nothing to republish
2018-06-18 16:32:25:056 [13892:4724674] ClockTick:33: nothing to republish
2018-06-18 16:32:26:057 [13892:4724674] ClockTick:34: nothing to republish
2018-06-18 16:32:27:057 [13892:4724674] ClockTick:35: nothing to republish
2018-06-18 16:32:28:058 [13892:4724674] ClockTick:36: nothing to republish
2018-06-18 16:32:29:058 [13892:4724674] ClockTick:37: nothing to republish
2018-06-18 16:32:30:059 [13892:4724674] ClockTick:38: nothing to republish
2018-06-18 16:32:31:059 [13892:4724674] ClockTick:39: nothing to republish
2018-06-18 16:32:32:058 [13892:4724674] ClockTick:40: nothing to republish
2018-06-18 16:32:33:059 [13892:4724674] ClockTick:41: nothing to republish

Looking further in the logs, it seems like this might be caused by attempting to reconnect after the connection has already been established. Here you can see that a message was just received and then it immediately attempts to reconnect, even though there is no error indicating that there was anything wrong with the connection.

2018-06-18 16:32:14:220 [13892:4724335] Websocket didReceiveMessage: Received 129 bytes
2018-06-18 16:32:14:220 [13892:4724674] -[MQTTDecoder stream:handleEvent:] [Line 55] EventCode:2, stream: <__NSCFInputStream: 0x1d411f920>, Thread: <NSThread: 0x1d4863240>{number = 32, name = (null)}
2018-06-18 16:32:14:220 [13892:4724674] -[MQTTSession decoder:newMessage:] [Line 351] messageType=3, status=2
2018-06-18 16:32:14:220 [13892:4724674] MQTTSession- newMessage msg type is 3
2018-06-18 16:32:14:220 [13892:4724674] -[MQTTSession handlePublish:] [Line 462] 
2018-06-18 16:32:14:220 [13892:4724674] <<<NSThread: 0x1d4863240>{number = 32, name = (null)}>>Topic: f988a889-a5a9-4736-97df-4d48a029ea9f is matched.
2018-06-18 16:32:14:220 [13892:4724674] <<<NSThread: 0x1d4863240>{number = 32, name = (null)}>>topicModel.callback.
2018-06-18 16:32:14:535 [13892:4724446] Attempting to reconnect.
2018-06-18 16:32:14:535 [13892:4724446] AWSIoTMQTTClient: connecting via websocket. 
2018-06-18 16:32:14:556 [13892:4724446] username is : ?SDK=iOS&Version=2.6.19

2018-06-18 16:32:14:556 [13892:4724446] Metrics collection is: Enabled
2018-06-18 16:32:14:557 [13892:4724446] Websocket is created and opened.

I'm able to regularly reproduce this issue by putting my device in airplane mode, turning off wifi, waiting a few seconds or minutes and then turning wifi back on.

iot requesting info

Most helpful comment

Absolutely. I will circle back on this thread once the fix is out.

All 19 comments

@mattczech
Thank you for reporting this issue. I will look into this and get back to you.

Any news regarding this issue, seems like there still is issues with reconnecting logic?

@mattczech, @aat2703

Just a quick update. I have been able to reproduce the issue on my side and am continuing to work on it. Will post an update once I finish up my analysis.

Cool @cbommas

Could this issue have any relation to https://github.com/aws/aws-sdk-ios/issues/972 ?

@aat2703

Yep, these are related. I have a fix that I am testing on my side and am hoping to include in the next rev of the SDK.

@cbommas thats what i thought hehe.. Will you loop back when you have an scheduled release date with this bug fix? 馃憤

Absolutely. I will circle back on this thread once the fix is out.

@cbommas do you have an eta for the next rev of the SDK?

@aat2703 @mattczech

Version 2.6.23 was released yesterday ( july 5) and contains the fixes for this issue.
Please upgrade and let me know if you are still running into issues.

@cbommas Thanks for looking into this and for the quick turn-around! v2.6.23 seems much more stable when disconnecting/reconnecting.

I'm still seeing an issue where the AWSIoTMQTTClient will attempt to reconnect when the connection status is connected.

Steps to reproduce:

  • Start up an application using AWSIoT. Allow it to connect to the MQTT session.
  • Background the app and lock the device. This seems to close the MQTT session.
  • Unlock the device and open the app.
  • The MQTT session will reconnect and after it's connected, it will reconnect again even though the session has already been connected. This causes a brief connection glitch where the connection status goes from Disconnected -> Connected -> Disconnected -> Connected. The connection eventually becomes connected and remains connected but it'd be preferred to avoid this glitch.

I added a debug statement to [AWSIoTMQTTSession reconnectToSession] that prints out the current AWSIoTMQTTStatus. You can see in the log below Attempting to reconnect. status=2, which corresponds to AWSIoTMQTTStatusConnected. Why would it reconnect if the session is already connected?

2018-07-12 12:25:41:084 [19005:4156973] Websocket didReceiveMessage: Received 5 bytes
2018-07-12 12:25:41:084 [19005:4157015] -[MQTTDecoder stream:handleEvent:] [Line 55] EventCode:2, stream: <__NSCFInputStream: 0x1cc1033c0>, Thread: <NSThread: 0x1cc07b380>{number = 19, name = (null)}
2018-07-12 12:25:41:084 [19005:4157015] -[MQTTSession decoder:newMessage:] [Line 361] messageType=9, status=2
2018-07-12 12:25:41:084 [19005:4157015] MQTTSession- newMessage msg type is 9
2018-07-12 12:25:41:084 [19005:4157015] -[MQTTSession handleSuback:] [Line 444] 
2018-07-12 12:25:41:084 [19005:4157015] Sub Ack messageId 12
2018-07-12 12:25:41:084 [19005:4157015] MQTTSessionDelegate new ack for msgId: 12
2018-07-12 12:25:41:939 [19005:4157015] ClockTick:21: nothing to republish
2018-07-12 12:25:42:487 [19005:4157013] Attempting to reconnect. status=2
2018-07-12 12:25:42:487 [19005:4157013] AWSIoTMQTTClient: connecting via websocket. 
2018-07-12 12:25:42:487 [19005:4156973] [SR] Closing with code 1000 reason (null)
2018-07-12 12:25:42:488 [19005:4156967] [SR] NSStreamEventHasSpaceAvailable <__NSCFOutputStream: 0x1c01045c0>
2018-07-12 12:25:42:536 [19005:4157013] username is : ?SDK=iOS&Version=2.6.19
2018-07-12 12:25:42.536720-0500 [19005:4156973] [MQTT] MQTT connection status changed Connecting

This seems to be caused by a race condition of detecting multiple error conditions around the same time that each initiating a reconnect:

  • Web socket fails, initiate reconnect timer (timer1) started.
  • timer1 expires, reconnectToSession called.
  • MQTTSessionEventConnectionClosed received; initiate reconnect timer (timer2) started (when this happens there's a check on the MQTT status to see if it's connected, but in this case it's still connecting so another timer is started).
  • Connection succeeds from timer1's attempt to reconnect.
  • timer2 expires, reconnectToSession called (MQTT status is connected, but a new connection is going to be attempted anyways!).

It seems like this could be resolved by adding a check on mqttStatus in [AWSIoTMQTTClient reconnectToSession] but I'm open to any insight you have.

Our app is currently on its way to the App Store. I'll request feedback from our customers once it has made its way out, and will circle back on this thread 馃憤

@mattczech
Thank you for your analysis on why the extra connection attempt is happening. I agree with your hypothesis and applied the fix you suggested. It has made it more stable and so far in my testing I haven't seen the extra connect. I will continue to test - Also, I will get this included in the next rev of the SDK. Thank you for your help in making this library better! 馃憤

@mattczech, @aat2703

Rev 2.6.25 released on 7/18 has the fixes for the connectionAge timer and the fix for the extraneous connect. Can you upgrade to the latest and let us know how it goes?

What implications did the "connectionAge timer" bug have?

The IoT client implements an exponential backoff strategy for connection retries. The initial value of the reconnectTime is set to baseReconnectTimeInterval set in the MQTTConfiguration object.

When the IoT client detects that it has lost the connection to the server, it will wait until the reconnectTime to try to reconnect. If the reconnect is unsuccessful, it will double the reconnectTime and wait before reconnecting. The doubling will happen until the maximumReconnectTimeInterval is reached.

From that point on the client will keep trying to reconnect once every reconnectTime seconds ( the reconnectTime will be pinned tomaximumReconnectTimeInterval seconds)

When the IoT client is able to connect successfully, it will wait for minimumConnectionTimeInterval before it resets the reconnectTime back to the baseReconnectTimeInterval.

The bug essentially was that the reconnectTime wasn't being reset. The implication is that if a client gets disconnected, the reconnectTime will max out and never be reset; which will mean that the client will only retry at the maximumReconnectTimeInterval. To the end user, it can appear that the app is taking a "long" time to reconnect ("long" is based on what the max is set to).

In v2.6.25 we are seeing this log messages too.

ClockTick:64: nothing to republish

@karasahinemre

Looks like you've set the logs at the debug level. You can eliminate these log messages by setting the log level of info or higher. I will set these messages to verbose in a future release.

@karasahinemre

The change to move the messages to verbose will be included in the next rev of the SDK.

Closing this issue as we are tracking updates on #1002

Was this page helpful?
0 / 5 - 0 ratings