Google-cloud-node: logging: best practices for batching log entries

Created on 21 Jun 2017 · 8Comments · Source: googleapis/google-cloud-node

Hello, I am working on implementing a logging solution, and have a few questions that I could use help with. Here is some context:

We would like to log every request that our nodejs server processes to stackdriver logging.
This request is in JSON format and contains many fields.
Each server we have processes a large amount of RPS.
Time and performance is crucial to the service.
The documentation states, "While you may write a single entry at a time, batching multiple entries together is preferred to avoid reaching the queries per second limit."
It appears it should be a common pattern to have a wrapper around the logging API and only write entries when a certain number of them is accrued.

I have searched through the docs, and I haven't found a solid answer on these questions:

Will gcloud-winston or gcloud-bunyan handle batching log entries?
Is there an example of batching log entries in a high throughput scenario?
What is the recommended amount of entries to batch before writing?

Thank you for your help.

question logging logging-bunyan logging-winston

Source

JoshFerge

All 8 comments

Hey @JoshFerge, thanks for asking. @ofrobots are you able to help with any of these questions?

stephenplusplus on 21 Jun 2017

Hi @JoshFerge.

The first thing to note is that the google-cloud/logging library already does perform batching of requests internally. Specifically, here's the relevant section of config:

        "WriteLogEntries": {
          "timeout_millis": 30000,
          "retry_codes_name": "non_idempotent",
          "retry_params_name": "default",
          "bundling": {
            "element_count_threshold": 1000,
            "request_byte_threshold": 1048576,
            "delay_threshold_millis": 50
          }
        },

What this means is that, by default, the logging library will batch up to 1000 log entries, or up to 1MB of serialized log entry data, or up to a duration of 50 milliseconds – whichever happens first. This will already give you some bundling of your calls to log.write and, as a result, mitigate some of the performance and quota concerns.

On top of this, if you are using google-cloud/logging-bunyan as the library to interface with logging, you'll get an additional layer of buffering as the bunyan logging stream is a proper WritableStream, which will do some internal buffering as well.

What is the recommended amount of entries to batch before writing?

I think the default batching configuration should be a reasonable starting point for most applications.

Will gcloud-winston or gcloud-bunyan handle batching log entries?

The answer is yes, as per above. If the defaults are not adequate for the needs of your application, you could probably batch more in your own code to tweak the performance further.

Note, however, that there is a tradeoff here – by buffering more, you can probably better performance, but you will also increase the risk is that more log entries will be lost in case your application crashes before the buffer has been flushed to the network. The memory consumption of your app will increase as well.

Is there an example of batching log entries in a high throughput scenario?

You could probably follow the example used by this third party module: bunyan-stackdriver. Their motivation was quota rather than performance.

Hope this helps. Let me know if you have additional questions.

ofrobots on 22 Jun 2017

👍1

closing the issue as there is no bug to fix, but feel free to continue discussing

stephenplusplus on 22 Jun 2017

@ofrobots thanks for the awesome and thorough response! If my understanding is correct then, the statement in the documentation under write (https://googlecloudplatform.github.io/google-cloud-node/#/docs/logging/1.0.0/logging/log?method=write)

While you may write a single entry at a time, batching multiple entries together is preferred to avoid reaching the queries per second limit."

is erroneous?

JoshFerge on 22 Jun 2017

I would agree that it is misleading. Quotas are at a project-level. If you have many applications running in the same project, all with very high QPS, it may still be possible to deplete your quota.

Batching in the application will help in those cases.

@stephenplusplus is it worth rewording this?

ofrobots on 23 Jun 2017

👍1

Yes, sounds good. PR welcome to remove the line altogether 👍

stephenplusplus on 23 Jun 2017

Fixed in #2409. Thanks @JoshFerge!

stephenplusplus on 24 Jun 2017

This issue was moved to googleapis/nodejs-logging-bunyan#6