Session: add user-agent detection to prevent session creation for bots

Created on 4 Nov 2014  路  4Comments  路  Source: expressjs/session

I was encountering issue with both crawler bots (that i would still like to be able to crawl my site for data), and AWS's HealthChecker bots getting session keys created for them, as this added unnecessary keys to my storage (redis).

I found piggy backing off the ua-parser2 library gave me most of what i needed, except for HeathChecker bots (https://github.com/commenthol/ua-parser2/issues/1). I added the following code to the function session(req, res, next) function just below the self-aware check:

   var UA = require('ua-parser2')(); /*included at top of document*/


   // dont generate for bots
    var isBot = false;
    var browserDetails = UA.parse(req.headers['user-agent']);
    if(browserDetails.ua.hasOwnProperty('type')){
        if(browserDetails.ua.type == 'bot'){
            isBot = true;
        }
    }
    if(browserDetails.string.indexOf('HealthChecker') >= 0){
        isBot = true;
    }
    if (isBot) return next();

Apologies if theres a super simple alternative to stopping bots from generating session's, but my googling turned up nothing :(

Happy to submit a pull request, but figured you guys might prefer to look into an alternative user-agent parser/detector

question

Most helpful comment

Hi! So there are of course multiple ways to approach this problem, but I'll admit up front that we wouldn't be adding anything to this module that does user agent sniffing for various reasons.

So let's start with the best way (not sniffing user agents): Set the saveUninitialized option to false and then in your code, only set things on req.session when an action has occurred and you actually _want_ a session to exist to hold onto that data. This means that just making a request to your site won't just automatically create a session until your code actually decided to put something into that session.

Otherwise, if you really do want to do use-agent sniffing, you can simply use the "middleware wrapping" pattern, where you create a middleware but you app.use() your own mini middleware that decides to execute the former middleware. Extending your example above, you would end with the following:

var express = require('express')
var expressSession = require('express-session')
var uaParser2 = require('us-parser2')

var app = express()

var sessionMiddleware = expressSession({
  // your configuration
})

app.use(function useSession(req, res, next) {
  // this would have normally been app.use(sessionMiddleware)
  if (!isBot(req)) {
    return sessionMiddleware(req, res, next)
  }
  next()
})

function isBot(req) {
  var userAgent = req.headers['user-agent']

  if (!userAgent) {
    // assume not a bot without a user agent
    return false
  }

  var browserDetails = usParser2.parse(userAgent)

  return browserDetails.ua.type === 'bot'
    || browserDetails.string.indexOf('HealthChecker') !== -1
}

All 4 comments

Hi! So there are of course multiple ways to approach this problem, but I'll admit up front that we wouldn't be adding anything to this module that does user agent sniffing for various reasons.

So let's start with the best way (not sniffing user agents): Set the saveUninitialized option to false and then in your code, only set things on req.session when an action has occurred and you actually _want_ a session to exist to hold onto that data. This means that just making a request to your site won't just automatically create a session until your code actually decided to put something into that session.

Otherwise, if you really do want to do use-agent sniffing, you can simply use the "middleware wrapping" pattern, where you create a middleware but you app.use() your own mini middleware that decides to execute the former middleware. Extending your example above, you would end with the following:

var express = require('express')
var expressSession = require('express-session')
var uaParser2 = require('us-parser2')

var app = express()

var sessionMiddleware = expressSession({
  // your configuration
})

app.use(function useSession(req, res, next) {
  // this would have normally been app.use(sessionMiddleware)
  if (!isBot(req)) {
    return sessionMiddleware(req, res, next)
  }
  next()
})

function isBot(req) {
  var userAgent = req.headers['user-agent']

  if (!userAgent) {
    // assume not a bot without a user agent
    return false
  }

  var browserDetails = usParser2.parse(userAgent)

  return browserDetails.ua.type === 'bot'
    || browserDetails.string.indexOf('HealthChecker') !== -1
}

Hey!

Thanks so much for your response! That's a much better way to do it. Appreciate you taking the time to help me out with that.

Cheers

It is absolutely no problem :)

I just came across this thread and I want to thank you both for the insightful information!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

UnderTheMoonspell picture UnderTheMoonspell  路  3Comments

xuya227939 picture xuya227939  路  3Comments

yolapop picture yolapop  路  3Comments

horses picture horses  路  5Comments

parky128 picture parky128  路  3Comments