Aws-sdk-js: DynamoDB write EPROTO

Created on 5 Jan 2016 · 149Comments · Source: aws/aws-sdk-js

Node: 4.2.1
AWS-SDK: 2.1.21
From the logs

{ [NetworkingError: write EPROTO]
  message: 'write EPROTO',
  code: 'NetworkingError',
  errno: 'EPROTO',
  syscall: 'write',
  address: undefined,
  region: 'us-east-1',
  hostname: 'dynamodb.us-east-1.amazonaws.com',
  retryable: true,
  time: Tue Jan 05 2016 17:35:05 GMT+0000 (UTC) } 'Error: write EPROTO\n    at Object.exports._errnoException (util.js:874:11)\n    at exports._exceptionWithHostPort (util.js:897:20)\n    at WriteWrap.afterWrite (net.js:763:14)'

Other server on node 0.12.0 does not have this issue.

If issue with current node and sdk, please follow up on AWS forums
https://forums.aws.amazon.com/thread.jspa?messageID=694520&#694520
https://forums.aws.amazon.com/thread.jspa?messageID=693172#693172

Summary (as of 2016/05/13):

Affected Nodejs versions reported: 4.1.2, 4.2.1, 4.2.3, 5.2.0, 5.4.0
- Versions statically linked to OpenSSL 1.0.2
Unaffected Nodejs versions reported: 0.12.0, 0.12.9
- Versions statically lined to OpenSSL 1.0.1
First reported US-EAST: Mon Dec 28 2015 21:01:05 GMT+0000 (UTC)
First reported US-WEST: 1/28 at 5:37pm pacific
Error rate: aprox five second window
Mitigation:
1. Set connection keepAlive
  - See @rma4ok's comment about usage on AWS Lambda
2. secureProtocol TLSv1_method
Status: Awaiting upstream fixes
- https://github.com/aws/aws-sdk-js/issues/862#issuecomment-205331371
- https://github.com/aws/aws-sdk-js/issues/862#issuecomment-205745880

Edit: potential keepAlive errors noted.
Edit: Removed untested status, see @southpolesteve's comment

new AWS.DynamoDB({
  httpOptions: {
    agent: new https.Agent({
      rejectUnauthorized: true,
      keepAlive: true,                // workaround part i. 
                                      // shouldn't be used in AWS Lambda functions
      secureProtocol: "TLSv1_method", // workaround part ii.
      ciphers: "ALL"                  // workaround part ii.
    })
  }
});

Server side mitigation (2016/06/29):

DynamoDB confirmed for me that they have updated to support TLS1.2 everywhere. We're still going to keep this issue open because we want to see the openSSL fix merged as well. That said, the root cause of this issue has been mitigated and users should no longer be affected by this bug.

If you do encounter this error after removing your workarounds, please post here. Knowing the version of node.js, the sdk, and the region the issue was encountered in would be helpful if you see this error again.
@chrisradek's comment

third-party

Source

phsstory

😕6 👍2

Most helpful comment

How embarrassing. I'm using both AWS.DynamoDB and AWS.DynamoDB.DocumentClient. However, I only implemented the fix for AWS.DynamoDB. Here is my final, working configuration:

const dynamodb = new AWS.DynamoDB({
  httpOptions: {
    agent: new https.Agent({
      ciphers: 'ALL',
      secureProtocol: 'TLSv1_method'
    })
  }
});

const dynamodbDoc = new AWS.DynamoDB.DocumentClient({
  service: dynamodb // <- JUST ADDED TO FIX MY CONFIGURATION
});

As before, I will continue to monitor my system logs. I have a good feeling about this, though.

westy92 on 5 Feb 2016

👍3 ❤1

All 149 comments

@phsstory
How often are you seeing this sort of error? Can you provide any information, like a code snippet that reproduces the issue, or details on where your client is running (i.e. on EC2, local machine, etc)?

Is this only happening with dynamodb or other services as well? Do you have any httpOptions set for the SDK?

chrisradek on 5 Jan 2016

Error is intermittent. S3, DynamoDB and STS are only services used, only calls to DynamoDB reported issues. Did not show in logs on a 0.12.0 node server; however the load on that server is significantly less so no calls could have happened during the error windows.

37 Failures 0 Success
Start: Mon Dec 28 2015 21:01:05 GMT+0000 (UTC)
End: Mon Dec 28 2015 21:01:11 GMT+0000 (UTC)
...
hundreds of successes
...
48 Failures 12 Success
Start: Mon Dec 28 2015 21:02:25 GMT+0000 (UTC)
End: Mon Dec 28 2015 21:02:30 GMT+0000 (UTC)
... pattern continues.

Mon Dec 28 2015 21:02:42 GMT+0000 (UTC)
Mon Dec 28 2015 21:02:47 GMT+0000 (UTC)

Mon Dec 28 2015 21:03:13 GMT+0000 (UTC)
Mon Dec 28 2015 21:03:19 GMT+0000 (UTC)

Mon Dec 28 2015 21:03:19 GMT+0000 (UTC)
Mon Dec 28 2015 21:03:29 GMT+0000 (UTC)

Mon Dec 28 2015 21:03:29 GMT+0000 (UTC)
Mon Dec 28 2015 21:03:36 GMT+0000 (UTC)
...
Tue Jan 05 2016 17:33:51 GMT+0000 (UTC)
Tue Jan 05 2016 17:33:53 GMT+0000 (UTC)

Tue Jan 05 2016 17:34:21 GMT+0000 (UTC)
Tue Jan 05 2016 17:41:18 GMT+0000 (UTC)

Our client is running on ElasticBeanstalk but there are reports of this error in other environments (see links to amazon forums)

function awslog(namespace) {
    var debug = require('debug')(namespace);
    return { log: function (msg) { debug(msg); } };
}
var _db = new AWS.DynamoDB({
  credentials: new AWS.TemporaryCredentials({
    RoleArn: REDACTED
    logger: awslog('phDev:phUser:STS')
  }),
  logger: awslog('phDev:phUser:DynamoDB')
});

_db.query(params, function (err, data) {
  if (err) { // Error is defined, app fails, err object failed to log }
  /*
    at User.<anonymous> (REDACTED)
    at Request.<anonymous> (/var/app/current/node_modules/aws-sdk/lib/request.js:350:18)
    at Request.callListeners (/var/app/current/node_modules/aws-sdk/lib/sequential_executor.js:100:18)
    at Request.emit (/var/app/current/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/var/app/current/node_modules/aws-sdk/lib/request.js:604:14)
    at Request.transition (/var/app/current/node_modules/aws-sdk/lib/request.js:21:12)
    at AcceptorStateMachine.runTo (/var/app/current/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /var/app/current/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/var/app/current/node_modules/aws-sdk/lib/request.js:22:9)
    at Request.<anonymous> (/var/app/current/node_modules/aws-sdk/lib/request.js:606:12)
*/
}

This is close to the extent of information I am able to provide on a public location.

phsstory on 5 Jan 2016

@phsstory
Thanks for the information you've provided, I'll try to reproduce the issue on my end as well. In the meantime, one suggestion I found for ERPROTO errors was to set keepAlive on the http agent. The reasoning was that using SSL means each connection requires more work, so using keepAlive to reuse existing connections cuts down on that. You might want to give that a try by adding the following to the config passed into new AWS.DynamoDB:

httpOptions: {
  agent: new https.Agent({
    rejectUnauthorized: true,
    keepAlive: true
  })
}

More agent options here: https://nodejs.org/api/http.html#http_new_agent_options

chrisradek on 5 Jan 2016

I'm having this same problem.
node: v5.2.0

luizstacio on 6 Jan 2016

@chrisradek

as an update, it has been 24hrs since we downgraded node to 0.12.0 with no indication of EPROTO errors, same code and sdk versions. I would guess it has to do with a combination of certain machines behind a load balancer (intermittent with varying time periods among reported users) and some internal deprecated feature of node or one of it's libraries. There is some speculation that it might be related to https://github.com/nodejs/node/issues/2244

Have you had a chance to contact the DynamoDB infrastructure guys to see what might have changed on Dec 28th or if there are any machines/loadbalancers still in the cluster still serving up RC4-SHA?

phsstory on 6 Jan 2016

Also got the same error using node v4.1.2. Adding what @chrisradek suggested appears to work for me.

natelaws on 11 Jan 2016

from the sounds of it this is a compatibility issue between the sdk/node and the DynamoDB service clusters, keepAlive is only mitigating the chances of connecting to a problematic machine/loadbalancer.

Unfortunately the amazon forums are the worst place to get information about amazon changes/issues since there is never a follow up by the techs.

@chrisradek can you run this through the inter-department channels to see if there are DynamoDB machines/loadbalncers that attempt to TLS negotiate with RC4-SHA?

phsstory on 11 Jan 2016

@phsstory
I'll bring this up with the DynamoDB team and report back what I learn.

chrisradek on 12 Jan 2016

@phsstory
I heard back, and DynamoDB does not make use of RC4.

chrisradek on 13 Jan 2016

I also have noticed this issue. And because I set keepAlive to true, it was unable to serve any request.

My Beanstalk configuration:

64bit Amazon Linux 2015.09 v2.0.4 running Node.js
Node v4.2.1

I was just wondering if we could interfere in choosing the cipher (if that is indeed the case): https://nodejs.org/api/https.html#https_https_request_options_callback

awerlang on 13 Jan 2016

@chrisradek That is unfortunate as it would have been a quick diagnosis to the issue.

As a recap:
Does not appear to affect node 0.12
Does affect 4.1, 4.2, 5.2
Errors appear to be grouped in 5s windows.
keepAlive mitigates the error but not believed to fix the problem.

phsstory on 13 Jan 2016

It seems to be an issue in node:

nodejs/node#3692

awerlang on 13 Jan 2016

It's happening a lot. I am using only Dynamo with latest node 5.4.0 and is having the problem.

mhkhung on 20 Jan 2016

Continuing to observe this issue on Node.js v5.4.0, setting keepAlive does not prevent this from occurring:

new AWS.DynamoDB(
{
    httpOptions: {
        agent: new https.Agent(
        {
            rejectUnauthorized: true,
            keepAlive: true
        })
    }
});

{
    "target": {
        "module": "aws-sdk",
        "version": "2.2.28",
        "export": "DynamoDB",
        "method": "putItem",
        "args": [
            {
                "Item": "*REDACTED*"
            }
        ]
    },
    "type": "log",
    "level": "error",
    "message": "failed to put item in DynamoDB",
    "error": {
        "message": "write EPROTO",
        "code": "NetworkingError",
        "errno": "EPROTO",
        "syscall": "write",
        "region": "us-east-1",
        "hostname": "dynamodb.us-east-1.amazonaws.com",
        "retryable": true,
        "time": "2016-01-23T02:56:44.705Z"
    },
    "stack": "Error: write EPROTO\n    at Object.exports._errnoException (util.js:856:11)\n    at exports._exceptionWithHostPort (util.js:879:20)\n    at WriteWrap.afterWrite (net.js:763:14)",
    "timestamp": "2016-01-23T02:56:44.706Z"
}

tristanls on 23 Jan 2016

@chrisradek I still believe this to be an issue with deprecated node functionality and a portion, likely small, of the DynamoDB stack using said encryption.

Since we have no knowledge of the DynamoDB stack architecture, I can only take make a wild guess. based on the time windows and the mitigation effect of keepalive, we will find it in the mechanism that handles dynamic growth while maintaining network connectivity for clients connecting during those brief periods.

In the interim, can you get a list of the available cyphers and options from DynamoDB so we can correlate them with the supported options in node to restrict the negotiation as mentioned by @awerlang ?

phsstory on 23 Jan 2016

I did look into ciphers by refreshing this over and over again (trying to catch different IPs) https://www.ssllabs.com/ssltest/analyze.html?d=dynamodb.us-east-1.amazonaws.com&latest

From my most recent run (54.239.20.144, 54.239.16.203):

TLS 1.0 only

TLS_RSA_WITH_AES_128_CBC_SHA (0x2f) 128
TLS_RSA_WITH_AES_256_CBC_SHA (0x35) 256
TLS_RSA_WITH_3DES_EDE_CBC_SHA (0xa) 112

tristanls on 23 Jan 2016

I forgot an additional detail that I haven't seen reported yet. I happen to be measuring the latency of the HTTPS calls to DynamoDB using aws-sdk, and when I get an EPROTO error back, the latency ~~(except in one case)~~ is 25 seconds, more precisely, most of it is between 25600 and 25700, some between 25700 and 25800 milliseconds. ~~(the one exception was 12915 milliseconds ¯\_(ツ)_/¯)~~

_(edit: there was no exception to 25 seconds for EPROTO error, my initial search was too inclusive)_

_edit 2: Looks like 25 seconds comes from DynamoDB aws-sdk retry policy https://github.com/aws/aws-sdk-js/blob/master/lib/services/dynamodb.js#L48 which would add 25550 milliseconds of delays across total of 11 attempts it makes_

tristanls on 25 Jan 2016

Thanks @tristanls,

This is starting to have the signs of a load balancer shuffle with connections being sent to a machine not quite ready to handle load and dumping the connection prematurely causing the TLS endpoint to bail on the client connection mid buffer. The odd ball is node 0.12 not having any issue. I wonder if 0.12 was less strict on this particular protocol error or silently ignored it.

phsstory on 25 Jan 2016

@chrisradek,

you might want to see if the DynamoDB dev env shows this same behavior with one of the reported node version under heavy load by multiple simulated accounts. You might be able to capture the connection and rebuild for protocol analysis.

phsstory on 25 Jan 2016

@phsstory looking at where EPROTO can come out of based on comments in https://github.com/nodejs/node/issues/3692 the paths look (to my untrained eye) quite different.

latest: https://github.com/nodejs/node/blob/master/src/tls_wrap.cc#L593
0.12: https://github.com/nodejs/node/blob/v0.12.7-release/src/tls_wrap.cc#L607

tristanls on 25 Jan 2016

@tristanls 0.12 doesn't throw EPROTO for this issue, it continues on without issue or degraded performance.

We had to downgrade our production servers until this issue can be resolved.

phsstory on 25 Jan 2016

@phsstory understood. I am working on the assumption that this is what's causing the EPROTO issue in v5.5.0 and latest: https://github.com/nodejs/node/blob/master/src/tls_wrap.cc#L592-L593. This code path for throwing EPROTO doesn't seem to exist in 0.12, which could explain the difference.

tristanls on 25 Jan 2016

Well, I haven't reproduced the problem (because in the stack below v0.12.7 also fails with EPROTO) error. But I'm not gonna be able to get back to this for a bit so wanted to post findings so far.

The c-server below is compiled using gcc server.c -o c-server.
The node-patched is node with this patch https://github.com/nodejs/node/issues/3692#issuecomment-158583358 included (I think... it compiled, and it's the main executable that got built).
node is vanilla v5.5.0.

I'm hoping to keep iterating on server.c until v0.12.7 passes and v5.5.0 fails in order to maybe reproduce eventually. I'm guessing next step will be to include SSL along the lines of http://stackoverflow.com/questions/7698488/turn-a-simple-socket-into-an-ssl-socket

In the meantime, I dumped the existing setup to docker hub https://hub.docker.com/r/tristanls/eproto-plus-patched-node/

[root@8fec5290e6f7 patched]# ll 
total 18740
-rwxr-xr-x 1 root root     9244 Jan 25 03:16 c-server
-rw-r--r-- 1 root root      259 Jan 25 03:16 client.js
-rwxr-xr-x 1 root root 19161989 Jan 24 00:34 node-patched
-rw-r--r-- 1 root root     1576 Jan 25 02:55 server.c
drwxr-xr-x 6 root root     4096 Jan 25 03:16 v0.12.7
[root@8fec5290e6f7 patched]# v0.12.7/bin/node -v
v0.12.7
[root@8fec5290e6f7 patched]# node -v
v5.5.0
[root@8fec5290e6f7 patched]# ./node-patched -v
v5.5.0
[root@8fec5290e6f7 patched]# cat client.js 
"use strict";

var https = require("https");
var options = {
    port: 443
};
var req = https.request(options, function(res)
{
    console.log(res.statusCode);
    res.on("data", function(data)
    {
        process.stdout.write(data);
    });
});
req.end();
[root@8fec5290e6f7 patched]# cat server.c 
/* A simple server in the internet domain using TCP
   The port number is passed as an argument */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

void error(const char *msg)
{
    perror(msg);
    exit(1);
}

int main(int argc, char *argv[])
{
     int sockfd, newsockfd, portno;
     socklen_t clilen;
     char buffer[256];
     struct sockaddr_in serv_addr, cli_addr;
     int n;
     if (argc < 2) {
         fprintf(stderr,"ERROR, no port provided\n");
         exit(1);
     }
     sockfd = socket(AF_INET, SOCK_STREAM, 0);
     if (sockfd < 0)
        error("ERROR opening socket");
     bzero((char *) &serv_addr, sizeof(serv_addr));
     portno = atoi(argv[1]);
     serv_addr.sin_family = AF_INET;
     serv_addr.sin_addr.s_addr = INADDR_ANY;
     serv_addr.sin_port = htons(portno);
     if (bind(sockfd, (struct sockaddr *) &serv_addr,
              sizeof(serv_addr)) < 0)
              error("ERROR on binding");
     listen(sockfd,5);
     clilen = sizeof(cli_addr);
     newsockfd = accept(sockfd,
                 (struct sockaddr *) &cli_addr,
                 &clilen);
     if (newsockfd < 0)
          error("ERROR on accept");
     bzero(buffer,256);
     n = read(newsockfd,buffer,255);
     if (n < 0) error("ERROR reading from socket");
     printf("Here is the message: %s\n",buffer);
     n = write(newsockfd,"I got your message",18);
     if (n < 0) error("ERROR writing to socket");
     close(newsockfd);
     close(sockfd);
     return 0;
}
[root@8fec5290e6f7 patched]# ./c-server 443 &
[1] 20
[root@8fec5290e6f7 patched]# v0.12.7/bin/node client.js 
Here is the message: %
events.js:85
      throw er; // Unhandled 'error' event
            ^
Error: write EPROTO 139816494778176:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:../deps/openssl/openssl/ssl/s23_clnt.c:782:

    at exports._errnoException (util.js:746:11)
    at WriteWrap.afterWrite (net.js:775:14)
[1]+  Done                    ./c-server 443
[root@8fec5290e6f7 patched]# ./c-server 443 &
[1] 26
[root@8fec5290e6f7 patched]# node client.js 
Here is the message: 
events.js:154
      throw er; // Unhandled 'error' event
      ^

Error: write EPROTO
    at Object.exports._errnoException (util.js:856:11)
    at exports._exceptionWithHostPort (util.js:879:20)
    at WriteWrap.afterWrite (net.js:763:14)
[1]+  Done                    ./c-server 443
[root@8fec5290e6f7 patched]# ./c-server 443 &
[1] 36
[root@8fec5290e6f7 patched]# ./node-patched client.js 
Here is the message: 
events.js:154
      throw er; // Unhandled 'error' event
      ^

Error: write EPROTO
    at Object.exports._errnoException (util.js:856:11)
    at exports._exceptionWithHostPort (util.js:879:20)
    at WriteWrap.afterWrite (net.js:763:14)
[1]+  Done                    ./c-server 443

tristanls on 25 Jan 2016

This also started happening in our ELB environments after updating nodejs from 0.12.9 to 4.2.3

{ [NetworkingError: write EPROTO]
message: 'write EPROTO',
code: 'NetworkingError',
errno: 'EPROTO',
syscall: 'write',
address: undefined,
region: 'us-east-1',
hostname: 'dynamodb.us-east-1.amazonaws.com',
retryable: true,

time: Mon Jan 25 2016 16:29:07 GMT+0000 (UTC) } Error
at /var/app/current/shared/data.js:333:16 (error handler throwing it)
at Response. (/var/app/current/shared/dynamodb.js:272:4)
at Request. (/var/app/current/node_modules/aws-sdk/lib/request.js:354:18)
at Request.callListeners (/var/app/current/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
at Request.emit (/var/app/current/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
at Request.emit (/var/app/current/node_modules/aws-sdk/lib/request.js:596:14)
at Request.transition (/var/app/current/node_modules/aws-sdk/lib/request.js:21:10)
at AcceptorStateMachine.runTo (/var/app/current/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/app/current/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request. (/var/app/current/node_modules/aws-sdk/lib/request.js:37:9)

laurihosio on 25 Jan 2016

Just an update, I have reached out to DynamoDB to see what can be done on their side to mitigate this issue. I'll update as I hear back. From my understanding, this issue has only affected DynamoDB users so far.

In the meantime, it looks like a lot of useful discussion has been taking place here, and there's still activity on the referenced issue on nodejs/node. Thank you so much for the efforts you've taken to investigate so far, you've all been great! We'll continue to monitor this thread and the one referenced on nodejs/node.

chrisradek on 27 Jan 2016

We're also seeing this issue, even on single instance setups through Elastic Beanstalk on node versions > 0.12.9 and as a result have downgraded certain setups requiring DynamoDB to node v0.12.9. A last resort might be to migrate to docker and run a custom build of node v5 with the hotfix mentioned in the nodejs repo.

magic53 on 28 Jan 2016

@chrisradek, currently my instance i-d69a5d5f is reliably generating EPROTO error when trying to talk to DynamoDB. My instances i-bd81f834 and i-03e9b1b0 are talking to DynamoDB ok. I'll see what I can investigate from my side, but perhaps there's someone who can look at the network/connectivity/differences and see what's going on? (us-east-1 region)

tristanls on 29 Jan 2016

BOOYAH! :) (hehe.. took a while to track this down)...

Node.js version is latest, (including fix to get meaningful errors from https://github.com/nodejs/node/commit/ff4006c7b05d677f6b63f01ad9c5faf97e0230bd)

{    
    "target": {
        "module": "aws-sdk",
        "version": "2.2.28",
        "export": "DynamoDB",
        "method": "query",
        "args": ["*redacted*"]
    },
    "type": "log",
    "level": "error",
    "message": "failed to query DynamoDB",
    "error": {
        "message": "write EPROTO 140550271170368:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:../deps/openssl/openssl/ssl/s3_pkt.c:362:\n",
        "code": "NetworkingError",
        "errno": "EPROTO",
        "syscall": "write",
        "region": "us-east-1",
        "hostname": "dynamodb.us-east-1.amazonaws.com",
        "retryable": true,
        "time": "2016-01-29T03:35:04.548Z"
    },
    "stack": "Error: write EPROTO 140550271170368:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:../deps/openssl/openssl/ssl/s3_pkt.c:362:\n\n    at exports._errnoException (util.js:859:11)\n    at WriteWrap.afterWrite (net.js:763:14)",
    "timestamp": "2016-01-29T03:35:04.549Z"
}

tristanls on 29 Jan 2016

With some additional logging...

    "type": "log",
    "level": "debug",
    "target": "ssl3_get_record",
    "recv": "301",
    "s.v": "303",
    "s.ref": "1"

...it looks like Node.js is expecting (s.v, socket version) TLS 1.2 x303 but getting (recv, received) TLS 1.0 x301.

tristanls on 29 Jan 2016

Thanks @tristanls
You mentioned you had some instances that were reliably generating these errors. Does that imply you always see that error, or is it sporadic but reproducible given enough time?

chrisradek on 29 Jan 2016

@tristanls
Is it also possible for you to determine what cipher is being used when you're encountering these issues?
Edit: I see you posted some ciphers here: https://github.com/aws/aws-sdk-js/issues/862#issuecomment-174219618
Were these taken when an error occurred?

chrisradek on 29 Jan 2016

@chrisradek
I meant sporadic bur reproducible given enough time.
The ciphers I posted were from the website here: https://www.ssllabs.com/ssltest/analyze.html?d=dynamodb.us-east-1.amazonaws.com. I'm uncertain what ciphers were attempted to be negotiated at the actual time of the error.

tristanls on 30 Jan 2016

@chrisradek, while continuing to troubleshoot this issue, I just had a weird experience that I can't explain.

I set secureProtocol option to a method that is not supposed to be supported by DynamoDB TLSv1_1_method:

[root@920c620688d9 /]# vi test.sh 

#!/bin/bash
NODE_DEBUG=net,tls node -e '
    var c = require("tls").connect(
    {
        host:"dynamodb.us-east-1.amazonaws.com",
        port:443,
        rejectUnauthorized: true,
        secureProtocol: "TLSv1_1_method"
    });
    c.pipe(process.stdout);
    c.write("GET / HTTP/1.1\r\nHost: dynamodb.us-east-1.amazonaws.com\r\n\r\n");
'

What was surprising is that it "worked"?? twice?

[root@920c620688d9 /]# ./test.sh 
NET 584: pipe false undefined
NET 584: connect: find host dynamodb.us-east-1.amazonaws.com
NET 584: connect: dns options { family: undefined, hints: 40 }
NET 584: _read
NET 584: _read wait for connection
NET 584: afterConnect
TLS 584: start
NET 584: _read
NET 584: Socket._read readStart
TLS 584: secure established
NET 584: afterWrite 0
NET 584: afterWrite call cb
NET 584: onread 255
NET 584: got data
HTTP/1.1 200 OK
Server: Server
Date: Sat, 30 Jan 2016 18:29:48 GMT
Content-Length: 42
Connection: keep-alive
x-amzn-RequestId: 4IIH8II2DIOA2V3KS7163570ARVV4KQNSO5AEMVJF66Q9ASUAAJG
x-amz-crc32: 3128867991

healthy: dynamodb.us-east-1.amazonaws.com NET 584: _read
^C
[root@920c620688d9 /]# ./test.sh 
NET 594: pipe false undefined
NET 594: connect: find host dynamodb.us-east-1.amazonaws.com
NET 594: connect: dns options { family: undefined, hints: 40 }
NET 594: _read
NET 594: _read wait for connection
NET 594: afterConnect
TLS 594: start
NET 594: _read
NET 594: Socket._read readStart
TLS 594: secure established
NET 594: afterWrite 0
NET 594: afterWrite call cb
NET 594: onread 255
NET 594: got data
HTTP/1.1 200 OK
Server: Server
Date: Sat, 30 Jan 2016 18:29:50 GMT
Content-Length: 42
Connection: keep-alive
x-amzn-RequestId: C7G7PAHI3UR674G0B5LC3LHH6BVV4KQNSO5AEMVJF66Q9ASUAAJG
x-amz-crc32: 3128867991

healthy: dynamodb.us-east-1.amazonaws.com NET 594: _read
^C
[root@920c620688d9 /]# ./test.sh 
NET 604: pipe false undefined
NET 604: connect: find host dynamodb.us-east-1.amazonaws.com
NET 604: connect: dns options { family: undefined, hints: 40 }
NET 604: _read
NET 604: _read wait for connection
NET 604: afterConnect
TLS 604: start
NET 604: _read
NET 604: Socket._read readStart
{"type":"log","level":"debug","target":"ssl3_get_record","recv":"301","s.v":"302","s.ref":"1"}
{"type":"log","level":"debug","target":"ssl3_get_record","contents":"16, 03, 01, 0a, 94, "}
NET 604: afterWrite -71
NET 604: write failure { Error: write EPROTO 139766103729984:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:../deps/openssl/openssl/ssl/s3_pkt.c:362:

    at exports._errnoException (util.js:859:11)
    at WriteWrap.afterWrite (net.js:763:14) code: 'EPROTO', errno: 'EPROTO', syscall: 'write' }
NET 604: destroy
NET 604: close
NET 604: close handle
events.js:155
      throw er; // Unhandled 'error' event
      ^

Error: write EPROTO 139766103729984:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:../deps/openssl/openssl/ssl/s3_pkt.c:362:

    at exports._errnoException (util.js:859:11)
    at WriteWrap.afterWrite (net.js:763:14)

My hypothesis: ~~Some DynamoDB servers do not enforce the published restriction on only using TLSv1 and not using TLSv1_1 and TLSv1_2.~~ edit: Some DynamoDB servers communicate using TLSv1_1 and TLSv1_2 while others restrict themselves only to TLSv1

Question: ~~Is there a subset of dynamodb.us-east-1.amazonaws.com servers that accept TLSv1_1 and/or TLSv1_2 connections in addition to accepting TLSv1 connections?~~ edit: Is it true that some dynamodb.us-east-1.amazonaws.com servers only accept TLSv1 connections, while others accept TLSv1, TLSv1_1, and TLSv1_2 or some combination thereof?

tristanls on 30 Jan 2016

I started seeing this error yesterday for dynamo in us-west-2 (no problems until yesterday - and we've been on node v4.2 for months). Downgraded to node 0.12.x.

klinquist on 30 Jan 2016

I'm also seeing this error quite often in dynamodb us-west-2. Whilst they are mitigating this for node LTS https://github.com/nodejs/node/issues/3692, it might be worth AWS investigating any consistency issues with cryptography across dyanamodb servers.

petemill on 31 Jan 2016

I too am hitting this error in us-west-2 with v2.2.33 of aws sdk and node v4.1.2.

our first observation of the problem in us-west-2 was at thursday 1/28 at 5:37pm pacific. the problem persisted for 3 hours, and then went away, to reoccur almost exactly 24 hours later on 1/29 at 5:17.

Timing is perhaps interesting for event correlation.

Disabling SSL appears to fix the issue - not much of a solution - but confirms the diagnosis in this thread..

A little more color - the issue was sporadic, at its worst, we'd see 100% failure, but often 1 in six. we confirmed that SSL dynamo connections from the same box via CLI every 500ms would perform well during the event.

lloyd on 31 Jan 2016

Confirm that it works after downgrading from Node 5.5.0 to 0.12.9

memoryonrepeat on 1 Feb 2016

@chrisradek

Some more info for the bug hunt, 0.12.x is statically linked against OpenSSL v1.0.1 whereas v4.x and v5.x depend on OpenSSL v1.0.2.

With the reported times of this behavior appearing in us-west, have the DynamoDB guys identified common release components by comparing release schedules?

phsstory on 1 Feb 2016

I noticed the error on jan 12 while running on us-east-1.

awerlang on 1 Feb 2016

From all of the investigations, this looks to be an issue between nodejs TLS module and DynamoDB. I have updated the issue title to reflect this assumption.

phsstory on 1 Feb 2016

@phsstory setting connection keepAlive to true is not a successful mitigation

Setting secureProtocol https option to TLSv1_method will ensure that no TLSv1_1 or TLSv1_2 connections are attempted:

var options = {
  secureProtocol: "TLSv1_method"
}

tristanls on 1 Feb 2016

@tristanls is this a full mitigation aka workaround?

phsstory on 1 Feb 2016

@phsstory I believe it is. The cause looks to be as follows:

There is a server pool, however, only some of these servers support TLS1.2 and they also generate sessions for clients. What happens next is that node.js connects to TSL1 server and attempts to reuse the session provided by TLS1.2 server. Thus, for some reason OpenSSL expects server to be at least TLS1.2, and is surprised to find out that it is not.
-- from https://github.com/nodejs/node/issues/3692#issuecomment-177300756

Only using TLS1 will never cache a session that is TLS1.2.

@brandonros also suggests ciphers: "ALL" in https://github.com/nodejs/node/issues/3692#issuecomment-177294788

tristanls on 1 Feb 2016

This commit just landed in Node.js master https://github.com/nodejs/node/commit/165b33fce2ed26e969beed3d3f7796708f0743e1. It will mitigate the issue at a cost of a bit extra work in case an EPROTO (or any other error occurs) because of retries, but should allow us to use default settings and expect SDK to succeed by default since it uses retries.

tristanls on 2 Feb 2016

@tristanls and @phsstory and everyone else that posted,
Thanks for your hard work troubleshooting this issue. Great to see some pull requests were made to node.js and openssl to resolve this issue. I was able to confirm with DynamoDB that your hypothesis stated above is accurate, so these changes should resolve the errors users have been seeing.

At this point I think it is safe to close this issue, but everyone should feel free to continue commenting if they have more to add.

chrisradek on 3 Feb 2016

Since the nodejs retries fix is not yet released and our untested workaround requires creating the SDK object with non-default values, I feel this ticket should not be closed until such time as the SDK can function with default values on latest nodejs release.

phsstory on 3 Feb 2016

+1 to keeping this issue open until the SDK works on stable nodejs.

Also, a massive thank you to the crazy amount of work put in by folks in finding and fixing this problem. It's been causing me very frustrating intermittent production issues since Christmas and I had no idea why.

camjackson on 4 Feb 2016

I updated my DynamoDB config to:

const dynamodb = new AWS.DynamoDB(
{
  httpOptions: {
    agent: new https.Agent(
    {
      ciphers: 'ALL',
      secureProtocol: 'TLSv1_method'
    })
  }
});

However, I'm still seeing the same error(s):

{
   "stack": "Error: write EPROTO\n    at Object.exports._errnoException (util.js:855:11)\n    at exports._exceptionWithHostPort (util.js:878:20)\n    at WriteWrap.afterWrite (net.js:763:14)",
   "message": "write EPROTO",
   "code": "NetworkingError",
   "errno": "EPROTO",
   "syscall": "write",
   "region": "us-west-2",
   "hostname": "dynamodb.us-west-2.amazonaws.com",
   "retryable": true,
   "name": "NetworkingError",
   "time": "2016-02-04T17:10:25.509Z"
}

I'm running on Node.JS 5.3.0. Do I also need to wait until a new, stable version of Node.JS is released and use that?

westy92 on 4 Feb 2016

I'll reopen the ticket for now until the fixes are confirmed to work. Given the reach of this issue it's probably a good idea to keep this visible as well.

chrisradek on 4 Feb 2016

Looks like the same thing that I'm doing (that is working):

var dynamodb = new AWS.DynamoDB({
    httpOptions: {agent: new https.Agent({secureProtocol: "TLSv1_method", ciphers: "ALL"}) } }))

klinquist on 4 Feb 2016

@klinquist Which version of node are you having success with in regards to the https.Agent options?

magic53 on 4 Feb 2016

@magic53 4.2.6

klinquist on 4 Feb 2016

@westy92 :(

@indutny provided a reduced test case for the issue here, which will throw EPROTO when run: https://gist.github.com/indutny/a021cca1711aa92e96a9

This test case, which adds secureProtocol: "TLSv1_method" no longer results in EPROTO error: https://gist.github.com/tristanls/8721d6a25af0b37bac07

I'm continuing to observe my deployment.

tristanls on 4 Feb 2016

@tristanls I restarted my Node server and I haven't been able to reproduce it again. I know I was using the right version of my application because it is inside a specific Docker container. I'll continue to monitor as well and report if it comes up again.

westy92 on 4 Feb 2016

I just experienced this EPROTO error while using this solution stack:
64bit Amazon Linux 2015.09 v2.0.6 running Node.js
Node version: 4.2.3

{ [NetworkingError: write EPROTO]
  message: 'write EPROTO',
  code: 'NetworkingError',
  errno: 'EPROTO',
  syscall: 'write',
  address: undefined,
  region: 'us-west-2',
  hostname: 'dynamodb.us-west-2.amazonaws.com',
  retryable: true,
  time: Thu Feb 04 2016 21:26:06 GMT+0000 (UTC) }

tielur on 4 Feb 2016

Though obviously not best practise, in a safe environment I believe you can mitigate the issue completely by turning off SSL for communication with the AWS API.

require('aws-sdk').config.update({sslEnabled: false});

petemill on 4 Feb 2016

@petemill sorry, but this is pretty terrible advice.

indutny on 4 Feb 2016

@indutny I said it's not best practice, but if you know what you're doing, and you need to keep your service running, and you're hosted _inside_ AWS anyway, I don't think it's so terrible if that's what you have decided to do.

petemill on 5 Feb 2016

@tielur just to confirm, you had secureProtocol: "TLSv1_method" set on the client?

tristanls on 5 Feb 2016

I have deployed fix like secureProtocol: "TLSv1_method", waiting for results.

Smbc1 on 5 Feb 2016

@tristanls no I did not. I'm currently only using new AWS.DynamoDB();. I'm trying to find out what the actual problem is, it seems to be intermittent? I'll look into adding the three config settings(keepAlive, secureProtocol, ciphers)

tielur on 5 Feb 2016

maxCachedSessions: 0 could probably help too.

indutny on 5 Feb 2016

@tielur The problem IS intermittent - some Dynamo servers are TLS1.0, others are 1.2. The problem is when node/openssl opens a session to 1.0 and then gets routed to a server using 1.2 for a future request.
My code block above is working great and in production

klinquist on 5 Feb 2016

Adding

httpOptions: {
             agent: new https.Agent({
                 secureProtocol: "TLSv1_method",
                 ciphers: "ALL"
             })
         }

seems to fix this issue on our end

tanyakaz on 5 Feb 2016

For those that reported it being fixed, how are you testing? Just letting it run for awhile, killing/swapping out EC2 instances?

tielur on 5 Feb 2016

@tielur The problem, while intermittent, showed up all the time across our 30+ node services.

We are using the same internal SDK with dynamo connecting code on all of them, so I updated the SDK and did tail -f *.log | grep EPROTO for 12+ hours, no lines returned :).

klinquist on 5 Feb 2016

@klinquist :+1: Good to hear! I'm mid hot-patch push now.

tielur on 5 Feb 2016

After allowing my Node server to run overnight, I did see the error pop up quite a few times across multiple instances. I verified that I am using the latest aws-sdk as well, 2.2.33. I had the "fix" in place that I mentioned above.

Some more information on my environment: I'm using Docker with a base image of node:5.3.0.

westy92 on 5 Feb 2016

Also getting the same error "NetworkingError: write EPROTO" using NodeJS 4.2.3 with AWS Elastic Beanstalk (Amazon Linux 64bit Amazon Linux 2015.09). Error occurs periodically when making calls to DynamoDB with no obvious pattern.

Edit: Same NodeJS code was previously used as an AWS Lambda function and the issue never appeared. Code was moved to Elastic Beanstalk, due to limitations with Lambda, and error started appearing after the move.

jmt0806 on 5 Feb 2016

@jmt0806 did you have secureProtocol: "TLSv1_method" set in your DynamoDB options?

tristanls on 5 Feb 2016

@tristanls No, going to test and see if anything changes.

jmt0806 on 5 Feb 2016

How embarrassing. I'm using both AWS.DynamoDB and AWS.DynamoDB.DocumentClient. However, I only implemented the fix for AWS.DynamoDB. Here is my final, working configuration:

const dynamodb = new AWS.DynamoDB({
  httpOptions: {
    agent: new https.Agent({
      ciphers: 'ALL',
      secureProtocol: 'TLSv1_method'
    })
  }
});

const dynamodbDoc = new AWS.DynamoDB.DocumentClient({
  service: dynamodb // <- JUST ADDED TO FIX MY CONFIGURATION
});

As before, I will continue to monitor my system logs. I have a good feeling about this, though.

westy92 on 5 Feb 2016

👍3 ❤1

@westy92 not embarrassing at all :) that's great news, thank you for the update. This way we don't have an unaccounted for failure case.

tristanls on 5 Feb 2016

Workaround by @westy92 works for us too on ElasticBeanstalk for node v4.2.3

vgoloviznin on 6 Feb 2016

Just started getting this error today. It started when deploying a new app version to ElasticBeanstalk running node v4.2.3 that depended on [email protected]. In this configuration the error was basically 100% consistent. I rolled back to a previous app version that used [email protected] and the error rate went way down, but still happened intermittently. I also tried the https.Agent config recommended by @klinquist and @westy92 but that did not help. Finally changed NodeVersion to 0.12.9 and the error went away.

dvonlehman on 7 Feb 2016

@tristanls Error message has not reappeared since implementing the suggested fix.

jmt0806 on 8 Feb 2016

@westy92 Solution works for us as well on ElasticBeanstalk using node v.5.5. Will keep monitoring and report back if it reappears.

jppellerin on 9 Feb 2016

Stable Node.JS has been updated to v5.6.0. Things may work again without this workaround.

westy92 on 10 Feb 2016

@westy92 is it should work or will work? @indutny said in PR 4982 that it's just mitigation, but not a complete fix.

Smbc1 on 10 Feb 2016

@Unterdrucker it is a mitigation that will cost a retry because if EPROTO occurs, session will be dropped. Previously, retry would fail because session was kept and was reused, this time, retry is expected to succeed.

tristanls on 11 Feb 2016

LTS has been updated to v4.4.0 including a potential fix. Would be great to hear if people are seeing improvements

MylesBorins on 14 Mar 2016

Hi, I'm not using Dynamo but I'm facing the same error by trying to connect to a secure ftp server.
Unfortunately, I've tried all the SSL_METHOD without success.

Perhaps not the same problem but I have tested with v4.40 also and same result, don't know if it answers your question @TheAlphaNerd

fdussert on 14 Mar 2016

And additional patch that I am working on is https://github.com/openssl/openssl/pull/852

indutny on 14 Mar 2016

Quite a long time for this issue to continue. Thank you guys for providing the information to resolve this. I have had various issues with SSL / Amazon specifically so when this came up you can imagine my worry!

bradennapier on 18 Mar 2016

I am also hit by this issue, using node 4.2.6 (MacPorts) and the us-west-2 region. I am doing scan operations to backup by database, and I saw the error several times already, almost a 3rd of my executions.

michipili on 4 Apr 2016

The fix is already in OpenSSL master, working on backporting it to 1.0.2 and then to the node.js master.

indutny on 4 Apr 2016

Fantastic! Thank you for the quick reply!

michipili on 4 Apr 2016

@indutny which version of Node do you think this be in ?

jedi4ever on 5 Apr 2016

I would say node >= v4 will get this update. However, the relevant patch should land in OpenSSL 1.0.2 branch first.

indutny on 5 Apr 2016

@jedi4ever Since this issue is practically caused by heterogeneous server pools, a simple catch-retry strategy around the failing call has good chances to work around it (for me, it does) in the mean time.

michipili on 5 Apr 2016

we fixed it with the secureProtocol: 'TLSv1_method' setting.
Although we wonder if SQS and SNS are not equally affected.

jedi4ever on 5 Apr 2016

fyi: same problem with new node version on aws lambda :-(

adrai on 9 Apr 2016

Could there be any reason to retry requests on errors?

indutny on 9 Apr 2016

Below option is not working for me, by using it I even can't initialize the db instance. The dynamodb variable is null.

dynamodb = new AWS.DynamoDB({
                           httpOptions: {
                              agent: new https.Agent({
                                keepAlive: true,
                                ciphers: 'ALL',
                                secureProtocol: 'TLSv1_method'
                            })
                           }
                        });

Now I'm trying to disable SSL for db connection(not test yet, will test later):
AWS.config.update({sslEnabled: false});

we-zhang on 12 Apr 2016

I am on the Nodejs 4.3 (aws lambda version). The following does not solve the timeout issue

var dynamoDb = new AWS.DynamoDB({
    apiVersion: '2012-08-10',
    httpOptions: { 
        agent: new https.Agent({
            rejectUnauthorized: true,
            keepAlive: true, 
            ciphers: 'ALL', 
            secureProtocol: 'TLSv1_method' 
        }) 
    } 
})

var dynamo = new AWS.DynamoDB.DocumentClient({service: dynamoDb})

If I try to add AWS.config.update({sslEnabled: false}) then everything times out, every time.

kyeotic on 26 Apr 2016

We are having success with:

const dynamodb = new AWS.DynamoDB({
  httpOptions: {
    agent: new https.Agent({
      ciphers: 'ALL',
      secureProtocol: 'TLSv1_method'
    })
  }
});

However, we are not using DocumentClient

magic53 on 2 May 2016

@magic53 That worked, apparently those extra parameters I was using were getting in the way.

kyeotic on 3 May 2016

Does this work?

aws.config.update({
    httpOptions: {
        agent: new https.Agent({
            ciphers: 'ALL',
            secureProtocol: 'TLSv1_method'
        })
    }
});

Thaina on 4 May 2016

@Thaina I am not using aws.config.update, I am simply initializing my AWS.DynamoDB instance with those options, exactly how @magic53 is doing it. This has completely resolved the issue.

I should note that I am also not using DocumentClient.

thedevkit on 5 May 2016

@tyrsius Timeouts would not necessarily be caused by this, unless your timeouts are very short.

Do you happen to have any open database connections in your Lambda function (RDS, ElasticCache, etc)? If so, you should make sure to use context.callbackWaitsForEmptyEventLoop = false; in your Lambda function entrypoint. Open db connections stay on the event loop, so your functions will timeout.

thedevkit on 5 May 2016

@thedevkit I have tried setting context.callbackWaitsForEmptyEventLoop = false, it had no effect.

The timeouts seem to have been caused by this, as th eworkaround above solved the problem

kyeotic on 5 May 2016

awsOptions.httpOptions = {
    agent: new https.Agent({
      rejectUnauthorized: true,
      keepAlive: true,                
      secureProtocol: "TLSv1_method"
    })
  }

Solved my problem with nodejs version 4.3.0

enGMzizo on 5 May 2016

@indutny
Thanks for all of your work so far, working with both node.js and openSSL. Has there been any progress on porting the patch you made in openSSL from master to the 1.0.2 branch? Is there anything we can do to help that effort?

chrisradek on 10 May 2016

@chrisradek I'm afraid that the review process is going a bit slower than I thought it would be. You may try adding 👍 reaction on this PR: https://github.com/openssl/openssl/pull/918 , or something else to get their attention. Maybe it will help?

Sorry for delay!

indutny on 10 May 2016

What @enGMzizo suggested solved my problem on 4.3.2 (Lambda). I also contacted AWS support to verify:

Until an upgraded version of nodeJS is offered within Lambda the recommended method to eliminate this error is to configure the following "httpOptions" setting: 

| secureProtocol: "TLSv1_method" 

Here is an example: 
---------------------- 

var dynamoDB = new AWS.DynamoDB({ 
httpOptions: { 
agent: new https.Agent({ 
rejectUnauthorized: true, 
keepAlive: true, 
ciphers: 'ALL', 
secureProtocol: 'TLSv1_method' 
}) 
} 
}); 

---------------------- 

As noted on the github thread, this is a known issue with some versions of nodeJS. More specifically this issue is caused by the client authentication handshakes resuming a TLS session using an id they got from a TLS 1.2 endpoint. 

Newer versions of nodeJS have the tls_wrap fix commit that has made it upstream. 

https://github.com/nodejs/node/pull/4885 

Node 6.0.0-pre has been confirmed working without the work-a-round above.

davidvanleeuwen on 10 May 2016

Anyone know if other services than DynamoDB are affected? A few replies suggest using AWS.config.update rather than only configuring the DynamoDB client, which would make sense if every service sits behind the same type of encryption layer. I'm guessing that's not the case though.

timdp on 10 May 2016

@phsstory please mention in header that keepAlive: true should not be used with AWS Lambda. Because in Lambda can put node process into sleep if not used and keepAlive get disconnected. After process wakes up it throws an exception that's impossible to handle and gets killed

Error: read ECONNRESET 
at exports._errnoException (util.js:870:11) 
at TLSWrap.onread (net.js:544:26)

rma4ok on 10 May 2016

👍2 😕1 🎉1

@rma4ok Then why did @davidvanleeuwen's support contact at AWS suggest using it? I kind of assumed that they had basically copied some code from the AWS SDK source and added the secureProtocol option.

timdp on 12 May 2016

@timdp keepAlive: true is an awesome option if you run node on a server. My statement is only related to AWS Lambda. In Lambda case container with node gets put into sleep between requests.

I think @phsstory should keep it as a part the fix solution with a comment

new AWS.DynamoDB({
  httpOptions: {
    agent: new https.Agent({
      rejectUnauthorized: true,
      keepAlive: true, // shouldn't be used in AWS Lambda functions
      secureProtocol: "TLSv1_method", // workaround part ii.
      ciphers: "ALL"                  // workaround part ii.
    })
  }
});

rma4ok on 13 May 2016

❤2 👍1

@timdp I see what you mean. This exception was not happening on every request but sometimes. My lambda gets about 20k hits a day. So if there is not much traffic it's a time-bomb.

@davidvanleeuwen check you lambda logs for

Error: read ECONNRESET 
at exports._errnoException (util.js:870:11) 
at TLSWrap.onread (net.js:544:26)

If there are any, it's caused by keepAlive: true

rma4ok on 13 May 2016

@phsstory the workaround is now pretty well tested? 😉

// untested workaround

tristanls on 13 May 2016

@tristanls

I personally have not completed full testing as our app is network isolated and able to use the older node version until this is fixed upstream without a code workaround.

If someone wants to take on the mantle of test blessing, I'll update the summary to reflect the endorsement.

phsstory on 13 May 2016

👍1

@phsstory @tristanls I can confirm this work around has resolved problems in multiple production systems for us. When using with Lambda I agree that keepAlive: true should not be used.

Here is a graph of errors/hour across all our lambda functions. You can see where the fix happened :)

southpolesteve on 13 May 2016

😄2

as requested, summary updated.

phsstory on 13 May 2016

@indutny Looks like the OpenSSL issue has been patched? Does this mean 4.3 et al are now working properly? Can we re-upgrade past 0.10 again now?

tylermakin on 28 May 2016

@tylermakin we are waiting for the backport commit to be landed so we can float the patch or get it in an update to openssl

--> https://github.com/openssl/openssl/pull/918

MylesBorins on 29 May 2016

@tylermakin that was a Pull Request for OpenSSL's master, which is still unstable and not used by node.js yet.

indutny on 29 May 2016

Half year of junk still can't be fix officially

While they are already planned to drop support on node 0.10 on October and suggest us to use this garbage. We can't migrate our production code and the end is near. All this shit while we need to pay to use this service

What the hell amazon?

Thaina on 24 Jun 2016

@Thaina
I talked with the DynamoDB team, and they have made a change on their end that should resolve this issue if the workarounds are removed. Are you currently still seeing this issue come up? If so, can you share what region you are encountering the error in?

chrisradek on 24 Jun 2016

@chrisradek Testing with it last month on tokyo region still not work. And most important is it has no official announcement that it was fix at all. Did you think we could trust our production code with this?

Thaina on 24 Jun 2016

@chrisradek I would also be very interested in a formal announcement regarding any fixes on behalf of the dynamodb team regarding this issue.

magic53 on 24 Jun 2016

We finally got a lgtm on the openssl backport. Hopefully we can get this in
4.x very soon!
On Jun 24, 2016 11:12 AM, "Michael M." [email protected] wrote:

@chrisradek https://github.com/chrisradek I would also be very
interested in a formal announcement regarding any fixes on behalf of the
dynamodb team regarding this issue.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/aws/aws-sdk-js/issues/862#issuecomment-228419355, or mute
the thread
https://github.com/notifications/unsubscribe/AAecV7Br1jK1A8AIdnX0-VnvAtX5ADyyks5qPB32gaJpZM4G_BOD
.

MylesBorins on 24 Jun 2016

@TheAlphaNerd did we, though?

indutny on 24 Jun 2016

I apologize if I misunderstood.

I thought https://github.com/openssl/openssl/pull/918#issuecomment-228344451
was a sign off.

Sorry for the confusion
On Jun 24, 2016 12:28 PM, "Fedor Indutny" [email protected] wrote:

@TheAlphaNerd https://github.com/TheAlphaNerd did we, though?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/aws/aws-sdk-js/issues/862#issuecomment-228439469, or mute
the thread
https://github.com/notifications/unsubscribe/AAecVzeVSIdwrXxTymTigtKBLoNcB0EGks5qPC_QgaJpZM4G_BOD
.

MylesBorins on 24 Jun 2016

Sorry for the lack of communication here. From our side, we were still waiting for the openSSL bug to be fixed before officially reporting the issue as resolved.

If you are still encountering issues without using the workarounds, please let us know. We would want to know which regions the issues are occurring in specifically.

chrisradek on 24 Jun 2016

@TheAlphaNerd that was just an encouraging comment :) Hopefully it will help, though!

indutny on 24 Jun 2016

@chrisradek Problem now is, you state that you need to wait for resolve, which it mean it not really fixed officially, communicate or not it cannot be trusted with production state

Hence you still have a plan to drop support of old version which is more trusted right now. And it is october. You fix it for 6 months with a plan to drop support of old version in 10 months. Over half of the time we can't migrate to use new version

Not to mention it waste our time to use old version of system that undeveloped

Another problem is, I don't know what the hell you are doing but the old version is still working. How the hell you update things that cannot work into production state and cannot rollback that component

Not to mention is half a year of fixing garbage with all workaround from customer and we need to spend money to use this product

So please stop this junk and just make lambda use docker instead of fixed language and version

Thaina on 25 Jun 2016

👎2

Thaina this isn't a new problem. We await an official resolution.

On Fri, Jun 24, 2016, 8:34 PM Thaina Yu [email protected] wrote:

@chrisradek https://github.com/chrisradek Problem now is, you state
that you need to wait for resolve, which it mean it not really fixed
officially, communicate or not it cannot be trusted with production state

Hence you still have a plan to drop support of old version which is more
trusted right now. And it is october. You fix it for 6 months with a plan
to drop support of old version in 10 months. Over half of the time we can't
migrate to use new version

Not to mention it waste our time to use old version of system that
undeveloped

Another problem is, I don't know what the hell you are doing but the old
version is still working. How the hell you update things that cannot work
into production state and cannot rollback that component

Not to mention is half a year of fixing garbage with all workaround from
customer and we need to spend money to use this product

So please stop this junk and just make lambda use docker instead of fixed
language and version

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/aws/aws-sdk-js/issues/862#issuecomment-228507115, or mute
the thread
https://github.com/notifications/unsubscribe/ABphF86CfSEfTcH3339bzXpzWsJ2Re2uks5qPKGygaJpZM4G_BOD
.

thedevkit on 25 Jun 2016

@thedevkit A new problem is here

http://docs.aws.amazon.com/lambda/latest/dg/programming-model.html

Important
The v0.10.42 runtime will be unavailable to create new functions beginning on October 2016, given the end-of-life announcement for this version. We recommend that you use v4.3

And 4.3 is a minefield that still not fixed. But they will force us to use it

Thaina on 28 Jun 2016

https://github.com/nodejs/node/issues/3692#issuecomment-177300756

There is a server pool, however, only some of these servers support TLS1.2 and they also generate sessions for clients. What happens next is that node.js connects to TSL1 server and attempts to reuse the session provided by TLS1.2 server. Thus, for some reason OpenSSL expects server to be at least TLS1.2, and is surprised to find out that it is not.

DynamoDB confirmed for me that they have updated to support TLS1.2 everywhere. We're still going to keep this issue open because we want to see the openSSL fix merged as well. That said, the root cause of this issue has been mitigated and users should no longer be affected by this bug.

If you do encounter this error after removing your workarounds, please post here. Knowing the version of node.js, the sdk, and the region the issue was encountered in would be helpful if you see this error again.

chrisradek on 30 Jun 2016

I experience the same problem with DynamoDBLocal. No workarounds from above help.
I wonder if anyone has a solution for the locally running Dynamo?
Couldn't google it out.

Thanks people.

koresar on 14 Jul 2016

@koresar how do you connect to local?

rma4ok on 14 Jul 2016

const aws = require('aws-sdk');

const dynamo = new aws.DynamoDB({
    region: 'foo-west-1',
    apiVersion: '2012-08-10',
    accessKeyId: 'bar',
    secretAccessKey: 'baz'
    endpoint: new aws.Endpoint('localhost:8000'), // tried all kinds of uris
    httpOptions: {
        agent: new https.Agent({ // tried all combinations
            rejectUnauthorized: true,
            keepAlive: true,
            secureProtocol: 'TLSv1_method', // tried all the openssl supported methods
            ciphers: 'ALL'
        })
    }
});

dynamo.describeTable({TableName: 'my-table'}, callback); // get error after a timeout

Error:

  [NetworkingError: write EPROTO 140735265157120:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:../deps/openssl/openssl/ssl/s3_pkt.c:362:
   ]
     message: 'write EPROTO 140735265157120:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:../deps/openssl/openssl/ssl/s3_pkt.c:362:\n',
     code: 'NetworkingError',
     errno: 'EPROTO',
     syscall: 'write',
     region: 'foo-west-1',
     hostname: 'localhost',
     retryable: true,
     time: Fri Jul 15 2016 10:00:08 GMT+1000 (AEST)

However, there is a workaround - use non-encrypted connection.
I.e. use http.Agent instead of https.Agent.

Problem solved (until you need TLS).

koresar on 15 Jul 2016

❤1

@koresar DynamoDBLocal listens for HTTP on port 8000
all this https.Agent fix was needed for actual AWS DynamoDB services, not for local. And seems like problem is already fixed on their side.
For local you should use just this

const dynamo = new aws.DynamoDB({
    region: 'foo-west-1',
    apiVersion: '2012-08-10',
    accessKeyId: 'bar',
    secretAccessKey: 'baz',
    endpoint: 'http://localhost:8000'
});

rma4ok on 15 Jul 2016

👍1

Has anyone noticed that while the issue is "fixed" with dynamo - it is still there for all the other amazon services? Namely if i want to invoke Lambdas from within Lambda....?

bradennapier on 27 Jul 2016

@bradennapier
Have you experienced this issue with other services? We haven't received any reports about getting this error when invoking Lambda functions from within Lambda.

The issue was specific to DynamoDB, since their servers were configured in such a way that they were affected by a bug in openssl.

If you are seeing this issue with another service, please let us know.

chrisradek on 27 Jul 2016

Yes I am currently experiencing the problem on one function. I am only assuming it is due to Lambda because allm y other functions appear to be fine. I am not sure if this is the exact same issue but it always shows in the same way - functions start with a second or so left and I get charged the entire timeout period for every call until I re-upload the function.

2016-07-22T01:53:46.098Z    1aedc582-4faf-11e6-8110-8350aa252064    Starting System Info 
2016-07-22T01:53:46.098Z    1aedc582-4faf-11e6-8110-8350aa252064    Remaining Time: 1016 
2016-07-22T01:53:46.150Z    1aedc582-4faf-11e6-8110-8350aa252064    Correct Keys

My functions getting called with 1 second to spare... 10 second function window... you charge me for it all... 

2016-07-22T01:52:51.150Z    f9c946bb-4fae-11e6-a4b6-ff7eb659c9ce    Starting System Info 
2016-07-22T01:52:51.150Z    f9c946bb-4fae-11e6-a4b6-ff7eb659c9ce    Remaining Time: 354 
2016-07-22T01:52:51.211Z    f9c946bb-4fae-11e6-a4b6-ff7eb659c9ce    Correct Keys 
2016-07-22T01:52:51.293Z    f9c946bb-4fae-11e6-a4b6-ff7eb659c9ce    Token is Valid Current: 1469152371251 Expires 1469757102163

2016-07-22T01:52:59.988Z    ffdf5964-4fae-11e6-9319-61a75113d4f2    Starting System Info 
2016-07-22T01:53:00.008Z    ffdf5964-4fae-11e6-9319-61a75113d4f2    Remaining Time: 1710 
2016-07-22T01:53:00.029Z    ffdf5964-4fae-11e6-9319-61a75113d4f2    Correct Keys 
2016-07-22T01:53:00.070Z    ffdf5964-4fae-11e6-9319-61a75113d4f2    Token is Valid Current: 1469152380068 Expires 1469757097928 
2016-07-22T01:53:00.071Z    ffdf5964-4fae-11e6-9319-61a75113d4f2    The Calls Required Parameters are: 

TART RequestId: 07ef8f10-4faf-11e6-bc0e-7f5932792b4b Version: $LATEST 
2016-07-22T01:53:13.949Z    07ef8f10-4faf-11e6-bc0e-7f5932792b4b    Starting System Info 
2016-07-22T01:53:13.991Z    07ef8f10-4faf-11e6-bc0e-7f5932792b4b    Remaining Time: 1253 
2016-07-22T01:53:13.995Z    07ef8f10-4faf-11e6-bc0e-7f5932792b4b    Correct Keys 
2016-07-22T01:53:14.031Z    07ef8f10-4faf-11e6-bc0e-7f5932792b4b    Token is Valid Current: 1469152394030 Expires 1469757105179 
2016-07-22T01:53:14.088Z    07ef8f10-4faf-11e6-bc0e-7f5932792b4b    The Calls Required Parameters are: 
2016-07-22T01:53:14.990Z    07ef8f10-4faf-11e6-bc0e-7f5932792b4b    Received Table Data: Dash_Systems

For example, the above is a function which has 15 second timeout - the first call I make is to log remaining time.

bradennapier on 27 Jul 2016

That sounds like a timeout issue which is not what we experience with this bug.

Have you tried turning off keepalive?
https://github.com/aws/aws-sdk-js/issues/862#issuecomment

phsstory on 27 Jul 2016

@phsstory are you saying to do that for dynamo? My other functions seem fine so I believe this one is due to lambda but i cant be sure - its a simple function so it's fairly annoying

const Doc = documentPromised({
    region: 'us-west-2',
    httpOptions: { agent: new https.Agent({  ciphers: 'ALL', secureProtocol: 'TLSv1_method' }) }
})

So I should turn that into

const Doc = documentPromised({
    region: 'us-west-2',
    httpOptions: { agent: new https.Agent({  keepAlive: false, ciphers: 'ALL', secureProtocol: 'TLSv1_method' }) }
})

in my lambda function?

(FYI Yes I tried without the promised with the same result)

bradennapier on 27 Jul 2016

@bradennapier
You can actually get rid of the custom https.Agent now.

Am I understanding correctly that your Lambda functions are failing to complete within a given amount of time? Are you getting an error? If you extend your Lambda function to allow it to run longer, do your functions complete?

chrisradek on 27 Jul 2016

Hello

Just tried this code with and without the httpOptions

var aws = require('aws-sdk');
var https = require('https');

exports.handler = function(event, context) {
  var dynamo = new aws.DynamoDB({
    region: 'eu-west-1',
    httpOptions: {
      agent: new https.Agent({
        rejectUnauthorized: true,
        keepAlive: false,
        ciphers: 'ALL',
        secureProtocol: 'TLSv1_method'
      })
    }
  });

  dynamo.listTables(function(err, data) {
    console.log('inside listTables');
    if (err)
      console.log(JSON.stringify(err, null, 2));
    else
      console.log(data.TableNames);
  });
};

And I get timeout. No problem locally calling AWS database.

benoittgt on 3 Aug 2016

Without the httpOptions specified, can you share the exact error you're getting? If it's not the one that's specified at the top of this thread, can you open a new issue?

chrisradek on 3 Aug 2016

@benoittgt you need to add maxRetries: 8 to you options. Otherwise DynamoDB client keeps retrying and your lambda exceeds 10 sec timeout and you don't get the real ErrorMessage

 var dynamo = new aws.DynamoDB({
    region: 'eu-west-1',
    maxRetries: 8
  });

PS make sure that you have 10sec timeout for lambda, cuz it's 6sec by default

@chrisradek could you guys put this stuff in Docs everywhere, because people never get real dynamo error messages if error is retry-able. And DynamoDB client has hardcoded retry logic which is not configurable as other AWS-SDK service clients.
I was to determine amount of retries for most common cases
maxRetries: 7 for lambda with timeout 6 sec (default)
maxRetries: 8 for lambda with timeout 10 sec (10 seconds if much better then 6)
maxReties: 12 for lambda with timeout 300 sec (for lambda not behind APIG, like DynamoDB stream worker)

cheers and happy coding

rma4ok on 3 Aug 2016

👍1

Thanks to both of you for the fast answers.

With :

var aws = require('aws-sdk');
var https = require('https');

exports.handler = function(event, context) {
  var dynamo = new aws.DynamoDB({
    region: 'eu-west-1',
    maxRetries: 8
  });

  dynamo.listTables(function(err, data) {
    console.log('inside listTables');
    if (err)
      console.log(JSON.stringify(err, null, 2));
    else
      console.log(data.TableNames);
  });
};

I get

START RequestId: 6ebafb49-59b1-11e6-b05b-f79f63e71369 Version: $LATEST
END RequestId: 6ebafb49-59b1-11e6-b05b-f79f63e71369
REPORT RequestId: 6ebafb49-59b1-11e6-b05b-f79f63e71369  Duration: 10000.24 ms   Billed Duration: 10000 ms   Memory Size: 128 MB Max Memory Used: 24 MB  
2016-08-03T19:35:38.781Z 6ebafb49-59b1-11e6-b05b-f79f63e71369 Task timed out after 10.00 seconds

benoittgt on 3 Aug 2016

@benoittgt
Are you using node.js 0.10.x in Lambda?
Your function may also be timing out due to not calling context.succeed and context.fail. Can you replace (or add) those where you've made your console.log calls in your callback function?

chrisradek on 3 Aug 2016

I did the test with context on both Node version available with the same errors posted on https://github.com/aws/aws-sdk-js/issues/862#issuecomment-237348407.

var aws = require('aws-sdk');

exports.handler = function(event, context) {
  var dynamo = new aws.DynamoDB({
    region: 'eu-west-1',
    maxRetries: 8
  });

  dynamo.listTables(function(err, data) {
    if (err) {
      context.fail(err.stack)
    } else {
      context.succeed('Function Finished! Data :' + data.TableNames);
    }
  });
};

benoittgt on 3 Aug 2016

@benoittgt
I created #1086 so we can discuss your issue there. It's not the same error as reported in this thread so I don't want to continue it here.

chrisradek on 4 Aug 2016

The NodeJS commit referenced in https://github.com/aws/aws-sdk-js/issues/862#issuecomment-178333617 has been present in all releases since 6.0.0, and DynamoDB has updated their servers to use TLS 1.2 everywhere, so I don't believe customers are continuing to see this issue.

jeskew on 4 Mar 2017

👍1

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.