Rocket.chat: Add Performance Monitoring Layer - Instrument Everything

Created on 23 Jan 2017  路  12Comments  路  Source: RocketChat/Rocket.Chat

With Kadira fading into computing history, and the rapidly increasing number of hosting providers offering Rocket.Chat as a service -- there is no better time to start making Rocket.Chat meaningfully monitor-able.

The ability to "look into" a running Rocket.Chat instance and obtain both realtime and rolling historical summary of application-meaningful metrics is vital for:

  • server (cluster) operation and maintenance
  • production performance tuning
  • debugging and diagnostics
  • Rocket.Chat SaaS operators
  • Rocket.Chat orchestrations and hosting
  • other purposes ....

Our experience with Kadira and Meteor level monitoring has been, in production emergency situations - close to useless. Lack of relevant application level data, and the sheer volume of low level data to shift through - despite the availability of ultra-flashy tools to process the data - makes troubleshooting and debugging close to impossible (requiring an ultra level of guru-hood to diagnose - if at all).

@TheReal1604 has put together this _inspirational_ Rocket.Chat cluster monitoring project, front-ended using Grafana, running today on his Raspberry Pi :

therealdashboard

Full details of his project can be found here: https://github.com/TheReal1604/rpi-monitoring-node

He is frequently available on https://demo.rocket.chat as @TheReal

Currently, his dashboard and workflow collect and display:

  • OS level metrics

We should consider a start to add instrumentation to application level object and events, identify the meaningful measurement metrics, and feed similar monitoring solutions / dashboards with:

  • node js level metrics
  • Rocket.Chat specific application level metrics
  • relevant container and/or hosting environment metrics
Request

Most helpful comment

@Sing-Li Just to clear some things ;-)...

I am working on my repo on a automated install script for a monitoring pi (Grafana, Telegraf and Influxdb based). At the moment you find in my repo a script which automated install the components and adds just a basic dashboard to monitor your internet connection - but i think this could easily expanded with the dashboard seen above that i created for my rocket.chat cluster at work.

The basic concept is that you got a rpi with influxdb (backend) and Grafana on it. Telegraf could after this be deployed on your rocket.chat nodes, which reports to your rpi with influxdb.

For bigger environments you could easily scale this to a dedicated server which does the monitoring and influxdb for you.

EDIT: Bradley is awesome and add some additions to the v1api to get some basic information of your rocketchat like total messages, rooms, users and so on.. this could be nice to see this as a graph.

https://github.com/RocketChat/Rocket.Chat/pull/5625

All 12 comments

@Sing-Li Just to clear some things ;-)...

I am working on my repo on a automated install script for a monitoring pi (Grafana, Telegraf and Influxdb based). At the moment you find in my repo a script which automated install the components and adds just a basic dashboard to monitor your internet connection - but i think this could easily expanded with the dashboard seen above that i created for my rocket.chat cluster at work.

The basic concept is that you got a rpi with influxdb (backend) and Grafana on it. Telegraf could after this be deployed on your rocket.chat nodes, which reports to your rpi with influxdb.

For bigger environments you could easily scale this to a dedicated server which does the monitoring and influxdb for you.

EDIT: Bradley is awesome and add some additions to the v1api to get some basic information of your rocketchat like total messages, rooms, users and so on.. this could be nice to see this as a graph.

https://github.com/RocketChat/Rocket.Chat/pull/5625

@thereal1604 the timing here is perfect! I've been looking at this very thing. Will you be around tomorrow on the demo server? I'd love to see more of what you have, and maybe collaborate to make it easier to gather stats.

If any help with this one is needed please reach out, Anything related to Grafana, Prometheus, CollectD, StatsD, Telegraf, Zabbix / General Monitoring is a strong area for me.

Would be good to be able to pull some metrics from inside Rocket.Chat with a simple API call like users connected, uptime, etc...

Inspired to build a new dashboard now.

@jszaszvari @TheReal1604 @geekgonecrazy I just started a private group on the demo server so we can all collaberate and get something good going with maybe some documentation for Rocket.Chat docs. You guys provide the expertise and I can try and get features added to Rocket.Chat to enable more monitoring. :+1:

@jszaszvari Prometheus should be integrated soon, and you'll be able to scrape metrics feed into your workflow. We can sure use your expertise in defining and implementing meaningful metrics :+1:

Prometheus should be functional.

Please start rolling the new metrics implementation PRs and perhaps a new Grafana dashboard project?

Hello, have you take a look to https://my-netdata.io/ ? it's a very nice solution for monitoring server.

my-detnata is compatible with Prometheus metrics so you can use it.

It's not designed for a single monitoring system, Its designed to be ingested by many different ones.

Hi,

are there any written Scripts to store RC statistics (eg.total messages, users etc) from mongodb into an influxdb for monitoring?

Ciao!

You can query this API endpoint

curl -H "Content-type:application/json" -H "X-Auth-Token: <your token>" -H "X-User-Id: <your id>" https://rocketchatserveraddress/api/v1/statistics

The best way to get it into influx is to use Telegraf with the httpjson input (details here - https://github.com/influxdata/telegraf/tree/master/plugins/inputs/httpjson) and use the influxdb output

@JSzaszvari how do you go about telling telegraf to login and then use the auth token received?

The layer exists already.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  路  3Comments

sta-szek picture sta-szek  路  3Comments

antn89 picture antn89  路  3Comments

neha1deshmukh picture neha1deshmukh  路  3Comments

brendanheywood picture brendanheywood  路  3Comments