As @ianabc reported in https://github.com/jupyterhub/jupyterhub/issues/2685#issuecomment-732310172, there is a discrepancy for the total_users metric reported in /hub/metrics with the total users reported in /hub/admin.
I'm using a JupyterHub 1.2.1. I would guess that total_users is incremented twice per user by mistake.
# HELP total_users total number of users
# TYPE total_users gauge
total_users 960.0


I think I can explain why this is happening; init_users in app.py looks at the database during startup and initializes TOTAL_USERS from that count. Next, _something_ calls UserListAPIHandler.get() which is somehow able to increment TOTAL_USERS (in some cases - see below) double counting users. For me, and I suspect for @consideRatio, the _something_ is jupyterhub-idle-culler. If I disable that, my counts are correct, but I think the actual problem is with init_users and UserDict.__getitem__.
I was surprised that UserListAPIHandler.get() was able to change state so I dug a bit deeper. That method has a list comprehension which calls user_model, and user_model does this
user = self.users[user.id]
The underlying UserDict.__getitem__ calls self.add if the key is an integer, and UserDict.add, increments the count via
def add(self, orm_user):
"""Add a user to the UserDict"""
if orm_user.id not in self:
self[orm_user.id] = self.from_orm(orm_user)
TOTAL_USERS.inc()
return self[orm_user.id]
At startup, init_users sets TOTAL_USERS but it doesn't populate the users attribute. Later on, when something calls this method, the if orm_user.id not in self condition passes and users get double counted.
It looks like the add method was designed to handle the case that the user is already in self.users so I made this branch which alters init_users to populate self.users. This means the if orm_user.id not in self condition fails on subsequent calls and it seems to fix the problem. I'd be (extremely!) happy to open a pull request if that helps, but I wanted to check I was on the right path before going too far. Also, is the self.add in UserDict.__getitem__ OK? It doesn't seem like that method should be allowed to update the users list.
Thanks for tracking it down! Indeed, the UserDict wrapper used to assume that it would have all existing users, so that's where the metric was implemented. This is no longer the case because it can be too expensive, but that also makes it a little more complicated to track user counts. For performance reasons, we don't want to fill UserDict with every existing user. Instead, I think moving the increment to where users are actually created is the right thing. #3289 should do this
is the self.add in UserDict.__getitem__ OK?
Yes, this is what enables app.users[username] to work, loading a User object from the database into the user wrapper collection by name, id, etc.