Mongoose: findById/findOne response after 10 seconds delay randomly

Created on 26 Jun 2020  路  20Comments  路  Source: Automattic/mongoose

Node: 12.x (AWS Lambda)
mongodb: 3.5.9
mongoose: 5.9.20
Database: MongoDB v4.2.8 (MongoDB Atlas)

I have a warmer running to keep the Lambda warm and hence reusing the DB connection.

I have the following setup to connect DB once during Lambda cold start:

let isConnected;

const connectToDatabase = () => {
  if (isConnected) {
    console.log('connectToDatabase> using EXISTING database connection...');
    return Promise.resolve();
  }

  mongoose.set('useCreateIndex', true);
  mongoose.set('useFindAndModify', false);
  mongoose.set('useUnifiedTopology', true);

  const options = {
    useNewUrlParser: true,
    bufferCommands: false, // Disable mongoose buffering
    bufferMaxEntries: 0 // and MongoDB driver buffering
  };

  return mongoose.connect(DATABASEURL, options).then((db) => {
    isConnected = db.connections[0].readyState;
    console.log(`connectToDatabase> MongoDB - isConnected: ${isConnected}`);
  });
};

for some reason randomly, if the Lambda is not accessed by clients (and hence hitting the DB) after a period of time (say half an hour to few hours) but kept alive by the warmer. Calling findById/findOne on a Mongoose object will only response after a 10 seconds delay.

I've just started testing with additional options:
options.keepAlive = true;
options.keepAliveInitialDelay = 300000;

Any suggestions on what else to try/change if the keepAlive setting doesn't help?

Most helpful comment

@timeswind not yet, it should be fixed with the next mongodb driver release. We've been running @newbaluku 's repro for weeks and haven't seen the issue with a patched version, so we're fairly certain that once this patch is released this issue will be fixed. Sorry for the delay!

All 20 comments

tried turning on debug and got the following output from one of the query that took 10 seconds

2020-06-26T12:41:42.680Z 991a9c44-efa5-49c3-93b7-b3c6c5c8cdac INFO [0;36mMongoose: [0m users.findOne({ [32m'tokens.token' [39m: [32m'eyJhbGciOiJIUzI1NiI.....XXXXXXXXX......' [39m, [32m'tokens.access' [39m: [32m'auth' [39m, uid: ObjectId("5ee0826aaa95df000895fadc")}, { projection: { password: [33m0 [39m, [32m'headMaps.blob' [39m [33m0 [39m, tokens: [33m0 [39m, qrTokens: [33m0 [39m }})

and this is the debug log for a normal response time (typically around 10 ms):

2020-06-26T12:54:18.193Z 945ff20e-b389-4c09-96cc-49152f51c095 INFO [0;36mMongoose: [0m users.findOne({ [32m'tokens.token' [39m [32m'eyJhbGciOiJIUzI1NiI......XXXXXXXX......' [39m, [32m'tokens.access' [39m: [32m'auth' [39m, uid: ObjectId("5ee0826aaa95df000895fadc")}, { projection: { password: [33m0 [39m, [32m'headMaps.blob' [39m [33m0 [39m, tokens: [33m0 [39m, qrTokens: [33m0 [39m }})

are there any other ways to help with debugging?

Experiencing a similar issue.

Node: 10.x (AWS Lambda)
mongoose: 5.9.19
Database: MongoDB v4.2.8 (MongoDB Atlas)

Unable to find the root cause of the issue

Hard to say exactly without being able to repro this consistently on my end. But I have a few ideas:

1) There's a race condition in your code: if two Lambdas call connectToDatabase() at the same time then they'll both call mongoose.connect(). I'd recommend something like this instead:

let conn = null;

const connectToDatabase = () => {
  if (conn != null) {
    console.log('connectToDatabase> using EXISTING database connection...');
    return Promise.resolve();
  }

  mongoose.set('useCreateIndex', true);
  mongoose.set('useFindAndModify', false);
  mongoose.set('useUnifiedTopology', true);

  const options = {
    useNewUrlParser: true,
    bufferCommands: false, // Disable mongoose buffering
    bufferMaxEntries: 0 // and MongoDB driver buffering
  };

  conn = mongoose.connection;
  return mongoose.connect(DATABASEURL, options).then((db) => {
    console.log(`connectToDatabase> MongoDB - isConnected: ${isConnected}`);
  });
};

2) Try reducing heartbeatFrequencyMS to 2000 and see if that changes the behavior: mongoose.connect(uri, { heartbeatFrequencyMS: 2000 }). Off the top of my head, the only default timeout I can think of that's exactly 10 sec is heartbeatFrequencyMS.

3) How approximate is the 10 seconds? Is it always 10 seconds plus a little bit, or is it sometimes 8 seconds and sometimes 12?

Sure, I will amend my code to handle the race condition.

I will also try and monitor the response by adding the option of heartbeatFrequencyMS: 2000. Thanks.

The delay is always 10 secs plus a little bit, never fall below 10 secs. and not more then 11 secs from my testing so far.

after setting heartbeatFrequencyMS: 2000. Preliminary testing shows that occasional query lag still happened after idling for a while. Just that instead of delay of 10 secs plus a bit, it now delay for 2 secs plus a bit. Sometimes, a bit less than 2 secs.

I wonder what is the implication of setting this.

heartbeatFrequencyMS controls how often the MongoDB driver sends a heartbeat to the server to check its status. Thanks for testing this, your feedback is very helpful, I can now look into the MongoDB driver and see why this might be happening.

not sure if you still need sample codes to repro this on your side. Let me know, if there is a need I will try to do up a sample project based on serverless.

@newbaluku a sample project that can consistently repro this issue would be extremely helpful. Otherwise we're just digging in the source code :)

@newbaluku my best guess as to the cause of this issue is that, when you define a model using mongoose.model(), Mongoose takes bufferCommands as it was at the time you called mongoose.model(). So in Mongoose v5.9.24, the below code wouldn't disable buffering on Model because bufferCommands isn't set when mongoose.model() is called.

const Model = mongoose.model('Test', Schema({ name: String }));

mongoose.connect(uri, { bufferCommands: false });

We'll add a fix for this to 5.9.25 :+1:

Is this fixed in 5.9.25?

because I just tried this version and I can still repro the 10 secs delay. I didn't specify heartbeatFrequencyMS in options.

@newbaluku we haven't been able to repro this on our side, the bufferCommands fix was our best guess.

In order to help us debug this, we added a 'buffer' event to Mongoose connections that gets emitted when a connection buffers a command. Can you please listen to that event using the below code:

mongoose.connection.on('buffer', data => {
  console.log('Buffered', mongoose.connection.readyState, data);
});

And let us know if that event is being fired when you get a 10 second delay? That would help us confirm that this is an issue with Mongoose buffering.

@vkarpov15 Thanks for following up.

I have added listener to the 'buffer' event above, but didn't see any log output when the delay occurs in all my tests.

To repro the issue, I have created a sample project:
https://github.com/newbaluku/repro-10s-delay-test.git

this code:

  • uses a warmer (run @ 10 mins interval) to kept the Lambda warmed.
  • query DB @ 35 mins interval to detect any slow response.

I'm able to repro the occasional >10 secs response using the above code connecting to the following DB versions:
MongoDB v4.2.8 Atlas cluster (M0, M2)
MongoDB v3.6.18 Atlas cluster (M10)
mongoose v5.9.25
Node.js v12.x

On two separate tests, I detected:

  • 5 instances of slow response after running continuously for 12 hrs.
  • 12 instances of slow response after running continuously for 30 hrs.

Thanks for your help.

So since the buffer event isn't getting triggered, this is probably an issue in the mongodb driver. Can you please clarify whether you're setting a read preference in your URI?

I managed to repro this with your serverless setup, thanks :+1:

So it turns out that when this happens, the Mongoose connection lost connectivity for some reason, readyState = 0 and the underlying MongoDB driver topology is in the below state:

TopologyDescription {
  type: 'ReplicaSetNoPrimary',
  setName: 'Cluster0-shard-0',
  maxSetVersion: 8,
  maxElectionId: 7fffffff0000000000000044,
  servers: Map {
    'cluster0-shard-00-00-XYZ.mongodb.net:27017' => ServerDescription {
      address: 'cluster0-shard-00-00-XYZ.mongodb.net:27017',
      error: MongoNetworkError: connection timed out
          at connectionFailureError (/var/task/node_modules/mongodb/lib/core/connection/connect.js:406:14)
          at TLSSocket.<anonymous> (/var/task/node_modules/mongodb/lib/core/connection/connect.js:294:16)
          at Object.onceWrapper (events.js:421:28)
          at TLSSocket.emit (events.js:315:20)
          at TLSSocket.Socket._onTimeout (net.js:481:8)
          at listOnTimeout (internal/timers.js:549:17)
          at processTimers (internal/timers.js:492:7),
      roundTripTime: -1,
      lastUpdateTime: 6957834,
      lastWriteDate: null,
      opTime: null,
      type: 'Unknown',
      minWireVersion: 0,
      maxWireVersion: 0,
      hosts: [],
      passives: [],
      arbiters: [],
      tags: []
    },
    'cluster0-shard-00-01-XYZ.mongodb.net:27017' => ServerDescription {
      address: 'cluster0-shard-00-01-XYZ.mongodb.net:27017',
      error: MongoNetworkError: connection timed out
          at connectionFailureError (/var/task/node_modules/mongodb/lib/core/connection/connect.js:406:14)
          at TLSSocket.<anonymous> (/var/task/node_modules/mongodb/lib/core/connection/connect.js:294:16)
          at Object.onceWrapper (events.js:421:28)
          at TLSSocket.emit (events.js:315:20)
          at TLSSocket.Socket._onTimeout (net.js:481:8)
          at listOnTimeout (internal/timers.js:549:17)
          at processTimers (internal/timers.js:492:7),
      roundTripTime: -1,
      lastUpdateTime: 6957834,
      lastWriteDate: null,
      opTime: null,
      type: 'Unknown',
      minWireVersion: 0,
      maxWireVersion: 0,
      hosts: [],
      passives: [],
      arbiters: [],
      tags: []
    },
    'cluster0-shard-00-02-XXX.mongodb.net:27017' => ServerDescription {
      address: 'cluster0-shard-00-02-XYZ.mongodb.net:27017',
      error: MongoNetworkError: connection timed out
          at connectionFailureError (/var/task/node_modules/mongodb/lib/core/connection/connect.js:406:14)
          at TLSSocket.<anonymous> (/var/task/node_modules/mongodb/lib/core/connection/connect.js:294:16)
          at Object.onceWrapper (events.js:421:28)
          at TLSSocket.emit (events.js:315:20)
          at TLSSocket.Socket._onTimeout (net.js:481:8)
          at listOnTimeout (internal/timers.js:549:17)
          at processTimers (internal/timers.js:492:7),
      roundTripTime: -1,
      lastUpdateTime: 6957835,
      lastWriteDate: null,
      opTime: null,
      type: 'Unknown',
      minWireVersion: 0,
      maxWireVersion: 0,
      hosts: [],
      passives: [],
      arbiters: [],
      tags: []
    }
  },
  stale: false,
  compatible: true,
  compatibilityError: null,
  logicalSessionTimeoutMinutes: null,
  heartbeatFrequencyMS: 10000,
  localThresholdMS: 15,
  commonWireVersion: 8
}

It looks like the socket to Atlas died for some reason in between calls, maybe due inactivity. Not sure why the 10 second lag is kicking in either. But regardless, looks like I can repro this and add more debug info to figure out what's happening, so hopefully I'll find the root cause soon.

So since the buffer event isn't getting triggered, this is probably an issue in the mongodb driver. Can you please clarify whether you're setting a read preference in your URI?

@vkarpov15 I didn't specify any read preference. The only options in my URI are: retryWrites=true&w=majority

I managed to repro this with your serverless setup, thanks 馃憤

So it turns out that when this happens, the Mongoose connection lost connectivity for some reason, readyState = 0 and the underlying MongoDB driver topology is in the below state:

TopologyDescription {
  type: 'ReplicaSetNoPrimary',
  setName: 'Cluster0-shard-0',
  maxSetVersion: 8,
  maxElectionId: 7fffffff0000000000000044,
  servers: Map {
    'cluster0-shard-00-00-XYZ.mongodb.net:27017' => ServerDescription {
      address: 'cluster0-shard-00-00-XYZ.mongodb.net:27017',
      error: MongoNetworkError: connection timed out
          at connectionFailureError (/var/task/node_modules/mongodb/lib/core/connection/connect.js:406:14)
          at TLSSocket.<anonymous> (/var/task/node_modules/mongodb/lib/core/connection/connect.js:294:16)
          at Object.onceWrapper (events.js:421:28)
          at TLSSocket.emit (events.js:315:20)
          at TLSSocket.Socket._onTimeout (net.js:481:8)
          at listOnTimeout (internal/timers.js:549:17)
          at processTimers (internal/timers.js:492:7),
      roundTripTime: -1,
      lastUpdateTime: 6957834,
      lastWriteDate: null,
      opTime: null,
      type: 'Unknown',
      minWireVersion: 0,
      maxWireVersion: 0,
      hosts: [],
      passives: [],
      arbiters: [],
      tags: []
    },
    'cluster0-shard-00-01-XYZ.mongodb.net:27017' => ServerDescription {
      address: 'cluster0-shard-00-01-XYZ.mongodb.net:27017',
      error: MongoNetworkError: connection timed out
          at connectionFailureError (/var/task/node_modules/mongodb/lib/core/connection/connect.js:406:14)
          at TLSSocket.<anonymous> (/var/task/node_modules/mongodb/lib/core/connection/connect.js:294:16)
          at Object.onceWrapper (events.js:421:28)
          at TLSSocket.emit (events.js:315:20)
          at TLSSocket.Socket._onTimeout (net.js:481:8)
          at listOnTimeout (internal/timers.js:549:17)
          at processTimers (internal/timers.js:492:7),
      roundTripTime: -1,
      lastUpdateTime: 6957834,
      lastWriteDate: null,
      opTime: null,
      type: 'Unknown',
      minWireVersion: 0,
      maxWireVersion: 0,
      hosts: [],
      passives: [],
      arbiters: [],
      tags: []
    },
    'cluster0-shard-00-02-XXX.mongodb.net:27017' => ServerDescription {
      address: 'cluster0-shard-00-02-XYZ.mongodb.net:27017',
      error: MongoNetworkError: connection timed out
          at connectionFailureError (/var/task/node_modules/mongodb/lib/core/connection/connect.js:406:14)
          at TLSSocket.<anonymous> (/var/task/node_modules/mongodb/lib/core/connection/connect.js:294:16)
          at Object.onceWrapper (events.js:421:28)
          at TLSSocket.emit (events.js:315:20)
          at TLSSocket.Socket._onTimeout (net.js:481:8)
          at listOnTimeout (internal/timers.js:549:17)
          at processTimers (internal/timers.js:492:7),
      roundTripTime: -1,
      lastUpdateTime: 6957835,
      lastWriteDate: null,
      opTime: null,
      type: 'Unknown',
      minWireVersion: 0,
      maxWireVersion: 0,
      hosts: [],
      passives: [],
      arbiters: [],
      tags: []
    }
  },
  stale: false,
  compatible: true,
  compatibilityError: null,
  logicalSessionTimeoutMinutes: null,
  heartbeatFrequencyMS: 10000,
  localThresholdMS: 15,
  commonWireVersion: 8
}

It looks like the socket to Atlas died for some reason in between calls, maybe due inactivity. Not sure why the 10 second lag is kicking in either. But regardless, looks like I can repro this and add more debug info to figure out what's happening, so hopefully I'll find the root cause soon.

@vkarpov15 I see. Thank you very much for the help.

Hi @newbaluku , thank you very much for your patience and thorough bug report. I believe this issue will be solved in https://github.com/mongodb/node-mongodb-native/pull/2551. If you want to test the fix in your code you can copy the changes in the wake method in the mongodb folder within the node_modules uploaded to your lambda function.

This this issue solved? I am currently using [email protected] with AWS Lambda and still experiencing the the 10 seconds query issue.

@timeswind not yet, it should be fixed with the next mongodb driver release. We've been running @newbaluku 's repro for weeks and haven't seen the issue with a patched version, so we're fairly certain that once this patch is released this issue will be fixed. Sorry for the delay!

This issue should be fixed by #9521

Was this page helpful?
0 / 5 - 0 ratings