Crystal: Feature request: Log class uses macros to optionally remove instrumentation at compile time

Created on 30 May 2020  路  31Comments  路  Source: crystal-lang/crystal

I understand from Crystal gitter chat that this has been discussed from time to time. Couldn't find a github issue so decided to create this one to capture this important idea.

AFAIK currently the desired run-time log level is set via environment variable, e.g. CRYSTAL_LOG_LEVEL=DEBUG which means instrumentation like this will cause output: Log.debug { "..." }

And AFAIK if the desired run-time log level effectively disables output of Log.debug { "..." } then currently the code path is optimized as much as possible but still a function is called and an if() statement causes an early return without any output or string manipulation. So if there are e.g. 1 million Log.debug { "..." } statements not outputting, there will still be 1 million functions called, with 1 million if statements, and 1 million returns... as an overhead.

While this is not bad, it means that the more Log.debug { "..." } statements added, the slower a Crystal program will run even if their output is disabled, due to all the unnecessary associated function calls.

And this is particularly bad for multi threaded programs which generally rely more on log instrumentation because debugging such programs with the debugger can sometimes not be practical.

This feature request is to optimize the functionality already there, so that certain log levels can be eliminated at compile time, similar to how this Crystal assert macro works [2] by @sam0x17 .

This idea is that if at least not CRYSTAL_LOG_LEVEL=DEBUG is given at compile time, then Log.debug { "..." } statement will actually disappear from the compiled code, meaning that whether or not CRYSTAL_LOG_LEVEL=DEBUG is specified at run-time, there will be no log function to call and no overhead.

In this way, if CRYSTAL_LOG_LEVEL is not given at compile time, there will be no change to current functionality and the particular Log. instrumentation will always be 'compiled in'. However, if CRYSTAL_LOG_LEVEL is given at compile time, then the corresponding level of Log. instrumentation will be 'compiled out', thus making the output binary slightly smaller and faster.

Thus Crystal developers can -- similar to the idea with the assert macro [2] -- create a 'release' version of their binary, as well as a slightly bigger, slower, and maybe safer 'debug' version of it.

AFAIK the Crystal macro facility could make this all work. But I'm not sure about the actual mechanism to tell the compiler what to compile in and out. The assert macro [2] uses the --release compile-time flag. But it maybe more useful and generic to use the CRYSTAL_LOG_LEVEL instead. The exact mechanism could be discussed in this issue?

Ultimately it would also be potentially useful to consolidate release and debug asserts into the Log class since ultimately asserts will log and one could think of asserts as a type of log instrumentation, and it would be great to have a single mechanism to decide whether to compile out both debug log and debug assert lines at compile time.

[1] https://crystal-lang.org/api/0.34.0/Log.html
[2] https://github.com/sam0x17/assert.cr

feature topiclog

All 31 comments

Probably it does not need to be a macro. Based on compile flags the method body of Log can be made noop. We need to check if that will be optimized and removed directly.

It is unclear if this will apply to Trace or Debug level entries. But is definitely something on the radar.

When building a program it will be possible to ignore completely as part of the compilation the debug calls. So there is no penalty even to check if those logs should be emitted. The debug methods can be replaced by noop during compile-time.
https://github.com/crystal-lang/crystal/issues/5874#issuecomment-587090645

Also while this is true, there's no call because a call with a block is inlined. If all the method is doing is an if, then logging, then the overhead is just an if. With branch prediction it might not even be noticeable.

It might be nice to do some. benchmarks to learn the real times.

there's no call because a call with a block is inlined

That could also be bad for performance because the inlined if() logic will bloat the code causing the same code to take bigger space causing more cache line usage and fetches in addition to the overhead of the if() business logic. That's why for example, profiling code in the Linux kernel uses a single byte which is either a single byte nop (super fast with minimal code bloat) if not used, or changed at run-time to a single byte int3 if activated... but not if() business logic which is going to be much bigger and slower... :-)

It might be nice to do some. benchmarks to learn the real times.

If the log instrumentation is compiled out at compile time, then you don't have to benchmark code which isn't there to know how long it takes to run :-)

This proposal essentially removes log levels for production builds. Could additional methods be provided so there is still debug/info logging but developer only extra logging uses xdebug, xinfo etc?

Real usage example for developer logging > info that shouldn't make to a production build ever:

if decryption_failed
  Log.xerror &.emit("decryption failed", key: key, nonce: nonce)
  raise DecryptionFailed.new
end

I have this sprinkled throughout my code but manually commented and removed. It's very easy to forget and leave this in a production build leaking keys/passwords or other sensitive information.

Log messages that are only relevant in development mode should just have an appropriate log level (such as debug or trace). I don't see the point in duplicating every other log level to have development version, when there are specific log levels for development mode.

I am very sympathetic to the ideas presented by @didactic-drunk because I have used them in development of very high performance production code in the past.

Let's say for example new functionality is created with lots of tests but still problems are infrequently dribbling through from production deployment. There are situations where you don't want to or cannot deploy the fully debug build for performance reasons, but never the less want to be able to increase the verbosity of debug output in the release / production version of the code in order to investigate hard to understand / replicate bugs which only occur in production.

I think a valid option / alternative to the valid idea presented by @didactic-drunk is to have a more complicated verbosity scheme whereby e.g. debug verbosity can be enabled for particular functions, files, and/or modules individually. In this way, a debug build can run faster without fully enabling debug instrumentation for all functions, files, and/or modules. However, this would be an advanced feature -- which I've seen implemented in C/C++ projects -- and I'm not sure how it could be efficiently implemented in Crystal. Maybe there should be another github issues created for such a feature request?

But I think the 3 ideas -- (a) compiling in/out instrumentation, (b) instrumentation that cannot be compiled out, and (c) function/file/module specific verbosity levels -- are not mutually exclusive and can be mixed and matched for ultimate flexibility. And this github issue only deals with (a) and (b).

Do we really need to stuff stdlib for this? Wrappers that behave like you want seems like a simple thing a shard could provide.

Using it with a macro and checking it at compile time makes it impossible to configure this at runtime, where it makes sense to do that (for example reading it from a configuration file).

Also, nobody benchmarked this to see if it really matters. Here's a benchmark:

require "benchmark"
require "log"

Benchmark.ips do |x|
  x.report("nothing logged") do
    Log.debug { "hello world!" }
  end
  x.report("nothing") do
  end
end

Output:

nothing logged 247.19M (  4.05ns) (卤 4.99%)  0.0B/op   2.78脳 slower
       nothing 686.23M (  1.46ns) (卤 6.32%)  0.0B/op        fastest

Of course it's slower than nothing, but it's already 4 nanoseconds. I can't imagine how that can slow down something. Well, at least with #9400

Wrappers that behave like you want seems like a simple thing a shard could provide.

Yeah, I agree, although as a newbie I haven't tried out how shards work yet. Can you recommend any tutorial for that?

Coming from the Perl world, one of its best features is CPAN and the tens of thousands of Perl modules available which built up over the years. So many modules that often you can choose between several modules that do the same thing.

Do we really need to stuff stdlib for this?

Hmmm... I'm -- maybe naively -- thinking that the extra functionality would be a few source lines of macros over the existing code. @jhass, do you think it would really be such a lot of code as to warrant the word 'stuff'?

And are you also suggesting that a PR for the existing log module with this proposed functionality would be rejected because of 'stuffing' in favor of a shard wrapper / alternative instead?

So if there are e.g. 1 million Log.debug { "..." } statements not outputting, there will still be 1 million functions called, with 1 million if statements, and 1 million returns... as an overhead.

With the benchmark above, you can see that the overhead for those 1 million calls is 0.004 seconds.

With the benchmark above, you can see that the overhead for those 1 million calls is 0.004 seconds.

hehe yep, that maybe nothing if you are using Crystal as an alternative to e.g. Perl, but that's an eternity if you are using Crystal as an alternative to C/C++ :-) And normally, it wouldn't just be a single Log.debug but many due to more complicated business logic... Let's say you have a network application processing n million packets per second... you might have dozens or even hundreds of Log.debug statements per packet depending upon the complexity...

that's an eternity if you are using Crystal as an alternative to C/C++

Crystal is already slower than C and C++ because of many reasons (GC being one). If you want that speed, I recommend using those languages. 4 nanoseconds for doing nothing will work fine for 99.99999999999% of the applications out there. For that small remaining percentage, if you want that performance, I recommend doing your own logging framework, or just adding a {% if flag?(:debug) %} puts "..." {% end %}.

We should design for most use cases, not for that tiny use case which doesn't seem to even exist here. This is all hypothetical. YAGNI.

@asterite I gave a non-performance real use example. Not YAGNI, not hypothetical. I'm dealing with this problem now.

@straight-shoota Either didn't understand or didn't address my concerns (Flat tone. Actually flat. We've misunderstood each other before and I don't see a response).

@simonhf I'm moving this to a shard. No code yet. Maybe in a week. Very busy now.

Maybe the difference in opinions is a typical (possibly web) app needs vs small message processing. I'm attempting to handle lots of small 256-8k messages. The log output dwarfs the message size. Much of it is only relevant in development and some of it is never logged in production due to it's sensitive nature. I can't/won't divide this by log level. Having multiple debug levels in production is still useful. I just need the sensitive and undiagnosable without source code messages removed because of security or noisy messages bogging down sysadmins. Again not based on log level.

If a log shouldn't reach a production sever, I think you need to check the environment (dev, staging, production) before logging this. Then, this has nothing to do with debug level, or being able to hide stuff at compile time.

You can do something like:

APP_ENV = {{ env("APP_ENV") || "dev" }}

def xerror(*args, **nargs, &block)
  {% if APP_ENV != "production %}
    Log.error(*args, **nargs) { |dsl| yield dsl }
  {% end %}
end

# ...

if decryption_failed
  xerror &.emit("decryption failed", key: key, nonce: nonce)
  raise DecryptionFailed.new
end

The current API already lets you do what you want.

@didactic-drunk As you can see, you can already do what you want. In fact, it seems you were already capable of doing it.

My point is, we shouldn't create a PR or suggestion for every thing that someone finds useful. The current API is already composable: you just wrap it and gate it behind a macro if you need it. There's no need for the Log framework to do that because what defined when something should be stripped out from the final build is not something the log framework can now, much less at compile time.

@simonhf

Yeah, I agree, although as a newbie I haven't tried out how shards work yet. Can you recommend any tutorial for that?

The tutorial is crystal init lib foo, cd foo, vim $stuff, git add ., git commit, git remote add origin https://github.com/user/foo.git, git push origin master :)

Hmmm... I'm -- maybe naively -- thinking that the extra functionality would be a few source lines of macros over the existing code

The initial implementation is one thing, there's many many more considerations. For starters, committing to maintaining the API. Then there's also usability considerations we need to make. The more API surface we provide, the more confusing is it use the functionality. In users it will cause more "Is this relevant to me?", "Should I use this or that?" etc. kind of questions.

You can do something like

@asterite Thanks for providing this example which I tried to get to work, but even after adding the missing double quote at "production", I only get errors :-( Would it be kindly possible to amend to a working example?

Plus, as a newbie I'm not sure I 100% get it. I see the use of the constant and the macro. Is the macro inside the def really necessary since wouldn't constant folding do the same thing if APP_ENV could be made to end up true or false? And lastly, isn't xerror() defined and called whether dev or something else? Or is xerror() automatically compiled away if it is "empty" of code?

This is for Crystal 0.34.0:

require "log"

{% begin %}
  APP_ENV = {{ env("APP_ENV") || "dev" }}
{% end %}

def xerror(*args, **nargs, &block)
  {% if APP_ENV != "production" %}
    Log.error(*args, **nargs) { yield }
  {% end %}
end

backend = Log::IOBackend.new
Log.builder.bind "*", :error, backend

# ...
decryption_failed = true
key = "foo"
nonce = "bar"

if decryption_failed
  xerror { "decryption failed: key: #{key.inspect}, nonce: #{nonce.inspect}" }
  raise "oh no"
end

Try running:

bin/crystal foo.cr

and

APP_ENV=production bin/crystal foo.cr

I see the use of the constant and the macro. Is the macro inside the def really necessary

It's just an example. From that point you can take it forward and code it as you wish.

@asterite hmmm... what could be going wrong for me?

$ crystal -v | head -1
Crystal 0.34.0 [4401e90f0] (2020-04-06)
$ crystal wrap-log.cr     
Unhandled exception: oh no (Exception)
  from /usr/share/crystal/src/log/log.cr:36:3 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:105:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:91:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:114:3 in 'main'
  from __libc_start_main
  from _start
  from ???
$ APP_ENV=production crystal wrap-log.cr 
Unhandled exception: oh no (Exception)
  from wrap-log.cr:20:3 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:105:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:91:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:114:3 in 'main'
  from __libc_start_main
  from _start
  from ???

Nothing, the example unconditionally raises an exception and that's what you're seeing.

Shouldn't I be seeing the "decryption failed" error too?

I just updated the example. In 0.34.0 the logger incorrectly didn't log anything unless configure. I had the example working for 0.35.0 but explaining how to run this was going to be harder than porting over to 0.34.0.

Yep, and I learned an alternative way through my newbie experiences too, by adding the following lines to the example:

Log.setup_from_env
STDOUT.flush_on_newline=true

Then the following works:

$ APP_ENV=production crystal build wrap-log.cr && CRYSTAL_LOG_LEVEL=DEBUG ./wrap-log 
Unhandled exception: oh no (Exception)
  from wrap-log.cr:23:3 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:105:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:91:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:114:3 in 'main'
  from __libc_start_main
  from _start
  from ???
$ APP_ENV=dev crystal build wrap-log.cr && CRYSTAL_LOG_LEVEL=DEBUG ./wrap-log        
E, [2020-06-01T19:31:06.494732000Z #92120]   ERROR -- wrap-log:: decryption failed: key: "foo", nonce: "bar"
Unhandled exception: oh no (Exception)
  from /usr/share/crystal/src/log/log.cr:36:3 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:105:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:91:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:114:3 in 'main'
  from __libc_start_main
  from _start
  from ???

So if I try this macro technique while doing some timing, how to explain the longer execution time against the empty loop and the loop with a regular Log.debug? I was expecting / hoping the macro version to have the same / similar time to the empty loop. Why does it end up longer than both empty or regular Log.debug?

$ crystal eval --time 'require "log"; x = 0; y = 1000000; while x < y; x += 1; end'
Execute: 00:00:00.012807206

$ crystal eval --time 'require "log"; x = 0; y = 1000000; while x < y; x += 1; Log.debug { "dummy" }; end'
Execute: 00:00:00.019682497

$ crystal eval --time 'require "log"; Log.setup_from_env; {% begin %}; APP_ENV = {{ env("APP_ENV") || "dev" }}; {% end %}; def xdebug(*args, **nargs, &block); {% if APP_ENV != "production" %}; Log.debug(*args, **nargs) { yield }; {% end %}; end; x = 0; y = 1000000; while x < y; x += 1; xdebug { "dummy" }; end'
Execute: 00:00:00.024651870

You include Log.setup_from_env and then not...

Thanks. And I left out the all important APP_DEV= :-) Here are the updated timings, and as you can see the macros appear to be working with the APP_DEV=dev timing end up similar to the regular Log.debug without macros, and the APP_DEV=production timing ending up similar to the empty loop:

$ crystal eval --time 'require "log"; Log.setup_from_env; x = 0; y = 10000000; while x < y; x += 1; end'
Execute: 00:00:00.036791254

$ crystal eval --time 'require "log"; Log.setup_from_env; x = 0; y = 10000000; while x < y; x += 1; Log.debug { "dummy" }; end'
Execute: 00:00:00.133681288

$ APP_ENV=dev crystal eval --time 'require "log"; Log.setup_from_env; {% begin %}; APP_ENV = {{ env("APP_ENV") || "dev" }}; {% end %}; def xdebug(*args, **nargs, &block); {% if APP_ENV != "production" %}; Log.debug(*args, **nargs) { yield }; {% end %}; end; x = 0; y = 10000000; while x < y; x += 1; xdebug { "dummy" }; end'
Execute: 00:00:00.135630305

$ APP_ENV=production crystal eval --time 'require "log"; Log.setup_from_env; {% begin %}; APP_ENV = {{ env("APP_ENV") || "dev" }}; {% end %}; def xdebug(*args, **nargs, &block); {% if APP_ENV != "production" %}; Log.debug(*args, **nargs) { yield }; {% end %}; end; x = 0; y = 10000000; while x < y; x += 1; xdebug { "dummy" }; end'
Execute: 00:00:00.036284051

Also you are not setting APP_ENV=production.

Try this:

require "benchmark"
require "log"
Log.setup_from_env

{% begin %}
  APP_ENV = {{ env("APP_ENV") || "dev" }}
{% end %}

def xdebug(*args, **nargs, &block)
  {% if APP_ENV != "production" %}
    Log.debug(*args, **nargs) { yield }
  {% end %}
end

Benchmark.ips do |z|
  z.report("loop") do
    x = 0
    y = 1000000
    while x < y
      x += 1
    end
  end
  z.report("log") do
    x = 0
    y = 1000000
    while x < y
      x += 1
      Log.debug { "dummy" }
    end
  end
  z.report("log macro") do
    x = 0
    y = 1000000
    while x < y
      x += 1
      xdebug { "dummy" }
    end
  end
end

Run with APP_ENV=production crystal foo.cr --release.

These are my results:

     loop 804.87M (  1.24ns) (卤 4.52%)  0.0B/op        1.02脳 slower
      log 417.00  (  2.40ms) (卤 4.31%)  0.0B/op  1962213.56脳 slower
log macro 818.24M (  1.22ns) (卤 5.03%)  0.0B/op             fastest

I guess that's the disadvantage of the benchmark module... you need to run it multiple times for each unique macro combination...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Papierkorb picture Papierkorb  路  3Comments

oprypin picture oprypin  路  3Comments

grosser picture grosser  路  3Comments

lgphp picture lgphp  路  3Comments

ArthurZ picture ArthurZ  路  3Comments