I just added the stackdriver gem to my Rails app running on Docker in Google Container Engine (GKE).
I have done no additional installations or configuration, just require the Stackdiver gem. Attempting to log a message produces the following error:
irb(main):001:0> Rails.logger.info "blah"
blah
=> true
irb(main):002:0> E1006 19:08:57.668311726 65 ssl_transport_security.c:427] Corruption detected.
E1006 19:08:57.668415894 65 ssl_transport_security.c:403] error:1e000065:Cipher functions:OPENSSL_internal:BAD_DECRYPT
E1006 19:08:57.668424390 65 ssl_transport_security.c:403] error:1000008b:SSL routines:OPENSSL_internal:DECRYPTION_FAILED_OR_BAD_RECORD_MAC
E1006 19:08:57.668433841 65 secure_endpoint.c:177] Decryption error: TSI_DATA_CORRUPTED
E1006 19:08:57.668540147 65 ev_epollsig_linux.c:1069] pollset_kick: {"created":"@1507316937.668516426","description":"Kick Failure","file":"src/core/lib/iomgr/ev_epollsig_linux.c","file_line":244,"referenced_errors":[{"created":"@1507316937.668515815","description":"OS Error","errno":3,"file":"src/core/lib/iomgr/ev_epollsig_linux.c","file_line":982,"os_error":"No such process","syscall":"pthread_kill"}]}
E1006 19:08:57.668552062 65 completion_queue.c:653] Kick failed: {"created":"@1507316937.668516426","description":"Kick Failure","file":"src/core/lib/iomgr/ev_epollsig_linux.c","file_line":244,"referenced_errors":[{"created":"@1507316937.668515815","description":"OS Error","errno":3,"file":"src/core/lib/iomgr/ev_epollsig_linux.c","file_line":982,"os_error":"No such process","syscall":"pthread_kill"}]}
I've just upgraded to the latest Stackdriver gem (was at 0.8.0 before and attempting to log would segfault the Ruby process, so this is an improvement).
stackdriver (0.10.0)
google-cloud-debugger (~> 0.28.0)
google-cloud-error_reporting (~> 0.27.0)
google-cloud-logging (~> 1.2)
google-cloud-trace (~> 0.27.0)
I'm investigating as we speak and realize that the issue is quite possibly in an underlying library but I'm pressed for time and need to get logging working so I'm posting here in hopes someone has knowledge or ideas they can shared.
Thanks, Neil.
Any news about this – we are hitting the same issue, but only for production env (for dev/staging everything works like a charm)?
Is there any code for the Stackdiver gem that executes only for Rails.env.production??
The same problem under Nginx compiled with BoringSSL-stable:
$ egrep -o 'OPENSSL_internal:.* while waiting for request' /var/log/nginx/error.log | sort | uniq -c | sort -nr
46 OPENSSL_internal:BAD_DECRYPT error:1000008b:SSL routines:OPENSSL_internal:DECRYPTION_FAILED_OR_BAD_RECORD_MAC) while waiting for request
13 OPENSSL_internal:DATA_NOT_MULTIPLE_OF_BLOCK_LENGTH error:1000008b:SSL routines:OPENSSL_internal:DECRYPTION_FAILED_OR_BAD_RECORD_MAC) while waiting for request
4 OPENSSL_internal:BAD_PACKET_LENGTH error:1000008b:SSL routines:OPENSSL_internal:DECRYPTION_FAILED_OR_BAD_RECORD_MAC) while waiting for request
We get the same, sometimes as part of a segfault. It happens reproducibly when the debugger is enabled, and goes away when not.
Segfault info:
2017-12-08 10:04:38.517530104 +0100 CET [web-1] [BUG] Segmentation fault at 0x00007fb243bff9d0
2017-12-08 10:04:38.517533393 +0100 CET [web-1] ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-linux]
2017-12-08 10:04:38.517534811 +0100 CET [web-1] -- Machine register context ------------------------------------------------
2017-12-08 10:04:38.517534168 +0100 CET [web-1]
2017-12-08 10:04:38.517581204 +0100 CET [web-1] R8: 0x0000000000000000 R9: 0x0000000004000001 R10: 0x00007ffc9accb9f0
2017-12-08 10:04:38.517580203 +0100 CET [web-1] RDX: 0x0000000000000001 RDI: 0x00007fb243bff700 RSI: 0x0000000000000000
2017-12-08 10:04:38.517582380 +0100 CET [web-1] R14: 0x00007fb2463124f0 R15: 0x00007fb2540433f0 EFL: 0x0000000000010206
2017-12-08 10:04:38.517581788 +0100 CET [web-1] R11: 0x00007fb2585005b0 R12: 0x000000000000000c R13: 0x00007fb257daee80
2017-12-08 10:04:38.517535389 +0100 CET [web-1] RIP: 0x00007fb2585005ca RBP: 0x00007ffc9accbc60 RSP: 0x00007ffc9accbbd0
2017-12-08 10:04:38.517577542 +0100 CET [web-1] RAX: 0x00007fb243c1e400 RBX: 0x00007fb243bff700 RCX: 0x0000000000000000
2017-12-08 10:04:38.517582873 +0100 CET [web-1]
2017-12-08 10:04:38.518305513 +0100 CET [web-1] E1208 09:04:38.517954670 38 ssl_transport_security.c:435] Corruption detected.
2017-12-08 10:04:38.517583452 +0100 CET [web-1] -- C level backtrace information -------------------------------------------
2017-12-08 10:04:38.518307703 +0100 CET [web-1] E1208 09:04:38.518027779 38 ssl_transport_security.c:411] error:100003fc:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_RECORD_MAC
2017-12-08 10:04:38.518308488 +0100 CET [web-1] E1208 09:04:38.518063099 38 secure_endpoint.c:185] Decryption error: TSI_DATA_CORRUPTED
2017-12-08 10:04:38.519270286 +0100 CET [web-1] [BUG] Segmentation fault at 0x00007fb243bff9d0
2017-12-08 10:04:38.519273529 +0100 CET [web-1] ruby 2.4.2p198 (2017-09-14 revision 59899) [x86_64-linux]
2017-12-08 10:04:38.519274274 +0100 CET [web-1]
2017-12-08 10:04:38.519357673 +0100 CET [web-1] RDX: 0x0000000000000001 RDI: 0x00007fb243bff700 RSI: 0x0000000000000000
2017-12-08 10:04:38.519275432 +0100 CET [web-1] RIP: 0x00007fb2585005ca RBP: 0x00007ffc9accbbf0 RSP: 0x00007ffc9accbb60
2017-12-08 10:04:38.519356030 +0100 CET [web-1] RAX: 0x00007fb243c1e400 RBX: 0x00007fb243bff700 RCX: 0x0000000000000000
2017-12-08 10:04:38.519358788 +0100 CET [web-1] R11: 0x00007fb2585005b0 R12: 0x000000000000000c R13: 0x00007fb257daee80
2017-12-08 10:04:38.519358259 +0100 CET [web-1] R8: 0x0000000000000000 R9: 0x0000000004000001 R10: 0x00007ffc9accb980
2017-12-08 10:04:38.519396794 +0100 CET [web-1] R14: 0x00007fb2463124f0 R15: 0x00007fb2540433f0 EFL: 0x0000000000010206
2017-12-08 10:04:38.519274863 +0100 CET [web-1] -- Machine register context ------------------------------------------------
2017-12-08 10:04:38.519399915 +0100 CET [web-1]
2017-12-08 10:04:38.519401011 +0100 CET [web-1] -- C level backtrace information -------------------------------------------
2017-12-08 10:04:38.548878605 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(rb_bug_context+0xd0) [0x133b46b770] error.c:506
2017-12-08 10:04:38.549424998 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(sigsegv+0x3e) [0x133b364abe] signal.c:907
2017-12-08 10:04:38.548874933 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(rb_vm_bugreport+0x528) [0x133b4764d8] vm_dump.c:684
2017-12-08 10:04:38.549427013 +0100 CET [web-1] /lib/x86_64-linux-gnu/libpthread.so.0 [0x7fb258507330]
2017-12-08 10:04:38.549427678 +0100 CET [web-1] /lib/x86_64-linux-gnu/libpthread.so.0(pthread_join+0x1a) [0x7fb2585005ca]
2017-12-08 10:04:38.549468201 +0100 CET [web-1] /app/vendor/bundle/ruby/2.4.0/gems/grpc-1.7.2-x86_64-linux/src/ruby/lib/grpc/2.4/grpc_c.so(grpc_executor_set_threading+0xf7) [0x7fb24e011d17]
2017-12-08 10:04:38.549468895 +0100 CET [web-1] /app/vendor/bundle/ruby/2.4.0/gems/grpc-1.7.2-x86_64-linux/src/ruby/lib/grpc/2.4/grpc_c.so(grpc_shutdown+0xd0) [0x7fb24e00a520]
2017-12-08 10:04:38.549470088 +0100 CET [web-1] /lib/x86_64-linux-gnu/libc.so.6 [0x7fb257a2b1a9]
2017-12-08 10:04:38.549469501 +0100 CET [web-1] /app/vendor/bundle/ruby/2.4.0/gems/grpc-1.7.2-x86_64-linux/src/ruby/lib/grpc/2.4/grpc_c.so [0x7fb24e00858e]
2017-12-08 10:04:38.549498170 +0100 CET [web-1] /lib/x86_64-linux-gnu/libc.so.6 [0x7fb257a2b1f5]
2017-12-08 10:04:38.549499441 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(rb_f_fork+0x82) [0x133b30e352] process.c:3712
2017-12-08 10:04:38.549498794 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(ruby_stop+0x10) [0x133b271820] eval.c:256
2017-12-08 10:04:38.549534168 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_exec+0x87) [0x133b3e1947] vm.c:1774
2017-12-08 10:04:38.549577609 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(rb_yield_1+0x50c) [0x133b3eb93c] vm.c:973
2017-12-08 10:04:38.549522025 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_exec_core+0x2ec0) [0x133b3de2b0] insns.def:967
2017-12-08 10:04:38.549521283 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_call_cfunc+0xf1) [0x133b3d5861] vm_insnhelper.c:1768
2017-12-08 10:04:38.549578519 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(int_dotimes+0x3e) [0x133b2ce60e] numeric.c:4999
2017-12-08 10:04:38.549674503 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_call_method+0xe3) [0x133b3e4243] vm_insnhelper.c:2308
2017-12-08 10:04:38.549579055 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_call_cfunc+0xf1) [0x133b3d5861] vm_insnhelper.c:1768
2017-12-08 10:04:38.549675459 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_exec_core+0x2ec0) [0x133b3de2b0] insns.def:967
2017-12-08 10:04:38.549675985 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_exec+0x87) [0x133b3e1947] vm.c:1774
2017-12-08 10:04:38.549676800 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(rb_load_internal0+0xad) [0x133b2b72dd] load.c:617
2017-12-08 10:04:38.549677365 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(rb_f_load+0x88) [0x133b2b79a8] load.c:646
2017-12-08 10:04:38.549678609 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_call_method+0xe3) [0x133b3e4243] vm_insnhelper.c:2308
2017-12-08 10:04:38.549677940 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_call_cfunc+0xf1) [0x133b3d5861] vm_insnhelper.c:1768
2017-12-08 10:04:38.549679104 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_exec_core+0x2241) [0x133b3dd631] insns.def:1066
2017-12-08 10:04:38.549705790 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(rb_load_internal0+0xad) [0x133b2b72dd] load.c:617
2017-12-08 10:04:38.549704987 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_exec+0x87) [0x133b3e1947] vm.c:1774
2017-12-08 10:04:38.550082269 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_exec_core+0x2241) [0x133b3dd631] insns.def:1066
2017-12-08 10:04:38.550080383 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_call_method+0xe3) [0x133b3e4243] vm_insnhelper.c:2308
2017-12-08 10:04:38.549716908 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(rb_f_load+0x88) [0x133b2b79a8] load.c:646
2017-12-08 10:04:38.549740552 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_call_cfunc+0xf1) [0x133b3d5861] vm_insnhelper.c:1768
2017-12-08 10:04:38.550083427 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(rb_load_internal0+0xad) [0x133b2b72dd] load.c:617
2017-12-08 10:04:38.550082868 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_exec+0x87) [0x133b3e1947] vm.c:1774
2017-12-08 10:04:38.550084082 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(rb_f_load+0x88) [0x133b2b79a8] load.c:646
2017-12-08 10:04:38.550084668 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_call_cfunc+0xf1) [0x133b3d5861] vm_insnhelper.c:1768
2017-12-08 10:04:38.550125342 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(vm_exec+0x87) [0x133b3e1947] vm.c:1774
2017-12-08 10:04:38.550138618 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(ruby_exec_internal+0xad) [0x133b26e19d] eval.c:244
2017-12-08 10:04:38.550162231 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(ruby_run_node+0x2d) [0x133b2718ad] eval.c:308
2017-12-08 10:04:38.550430119 +0100 CET [web-1] /app/vendor/ruby-2.4.2/bin/ruby(main+0x4b) [0x133b26df5b] addr2line.c:177
Version info is Ruby 2.4.2
google-cloud-core (1.1.0)
google-cloud-env (~> 1.0)
google-cloud-debugger (0.29.1)
binding_of_caller (~> 0.7)
google-cloud-core (~> 1.1)
google-cloud-logging (~> 1.0)
google-gax (~> 0.10.1)
stackdriver-core (~> 1.2)
google-cloud-env (1.0.1)
faraday (~> 0.11)
google-cloud-error_reporting (0.28.1)
google-cloud-core (~> 1.1)
google-gax (~> 0.10.1)
stackdriver-core (~> 1.2)
google-cloud-logging (1.3.2)
google-cloud-core (~> 1.1)
google-gax (~> 0.10.1)
stackdriver-core (~> 1.2)
google-cloud-trace (0.28.1)
google-cloud-core (~> 1.1)
google-gax (~> 0.10.1)
stackdriver-core (~> 1.2)
google-gax (0.10.2)
google-protobuf (~> 3.2)
googleapis-common-protos (>= 1.3.5, < 2.0)
googleauth (~> 0.6.2)
grpc (>= 1.7.2, < 2.0)
rly (~> 0.2.3)
google-protobuf (3.5.0)
googleapis-common-protos (1.3.7)
google-protobuf (~> 3.0)
googleapis-common-protos-types (~> 1.0)
grpc (~> 1.0)
googleapis-common-protos-types (1.0.1)
google-protobuf (~> 3.0)
googleauth (0.6.2)
faraday (~> 0.12)
jwt (>= 1.4, < 3.0)
logging (~> 2.0)
memoist (~> 0.12)
multi_json (~> 1.11)
os (~> 0.9)
signet (~> 0.7)
grpc (1.7.2)
google-protobuf (~> 3.1)
googleapis-common-protos-types (~> 1.0.0)
googleauth (>= 0.5.1, < 0.7)
@apolcyn This looks to be coming from GRPC. Can you confirm?
I'm not sure where the root of this is necessarily from, but this is a grpc stack trace - it does look like the error messages and segv are from within grpc.
@dazuma This issue is falling outside of SLO, can we confirm if this is indeed occurring in grpc? Is there an upstream issue?
Did someone figure out this problem? We have the same :-(
@BenBach What version of the grpc gem are you using?
@blowmage 1.8.3
@BenBach Can you post your backtrace?
I want to post a little follow up for everyone...
This was a major issue. It crashed the app. Yeah... the logging library, which - in my opinion -
should do everything in it's power to never disrupt your application and whatever business it's doing.
On top of that, it's Google's own libraries for premium Google features. So I was paying them to crash my app and/or paying them for a service that didn't work.
I thought this would be something that Google could fix. So I paid for the premium help - gold level. The highest I could go! I sent them a more detailed account of what I had posted here.
The first support person to respond told me that Google's library on Google's platform that either crashed my app or prevented me from using the Google service Google was charging me for was not an urgent issue and/or not really their problem and downgraded my request:
Please note that individual application setup is out of scope for Google Cloud Support, we provide support only for Google Cloud Products, so all my support here is best effort.
Priority P1 is reserved for production outage, such as when your end users are unable to use your services, therefore I am changing the priority to P2.
So I guess using Google's libraries to access premium Google Cloud services that you're paying for amounts to "individual application setup", even when the problem is very clearly in Google's code.
In addition to that, your application crashing - which makes it completely unavailable for all end users - is not a production outage where end users are unable to use our services. Huh.
After my issue was downgraded to "maybe sometime next week or later this month or whatever" status, and following a few frustrating back and forth emails with my "gold" support, their resolution was to let me know that someone had already reported the issue on GitHub and I should follow up there... and directed me here:
Hello Neil Souza,
Thanks for your patience.
Regarding to your concerns, there's a known issue with the Ruby Libraries and the Google Cloud Ruby client. By now this issue has been reported on GitHub [1]. I would point you to this link to follow it to receive regular updates from our engineering team.
Please let me know if I can do more for you.
Regards,
XXXXX
[1] https://github.com/GoogleCloudPlatform/google-cloud-ruby/issues/1772
Weird how another Neil Souza had reported the same issue on the same day with the same stack traces. I guess it does go to show that Google support knows how to Google... eventually. This "solution" came a month after I reported the issue and opened the ticket.
Basically, I spent, I don't know - hundreds, maybe a thousand dollars - for someone to waste my time for a month or so and unwittingly point me back at my report of the issue with the resolution that it's... known!
There is no small amount or irony in paying Google ~$1K to Google something for you.
I just wanted to let everyone know... don't expect any help. Don't expect anything to get fixed. And you probably don't want to pay for support. You're on your own in Google-cloud-land. It's an awesome platform, and they are pretty great with computers, but it seems Google is still struggling as much as ever to interface with human beings ;)
To any Googles that end up reading this:
I'm just frustrated that I shelled out for the top-tier support package and no one could be bothered to look into it, but I knew what I was getting into. I worked with you guys on OpenSocial quite a bit in '07-08; I wrote Zynga's FB/OpenSocial adaptation layer and their Mafia Wars game, which was one of the largest OpenSocial apps (MySpace! Man, Google and MySpace was a awkward working relationship). I wrote and ran one of the first large apps on App Engine in like '09. We got way back. I knew that without obscene spend I probably needed backdoor connections to the dev teams to get anything addressed. I thought maybe things had changed since then, but while the experience was insulting, frustrating and honestly a total rip-off, it was not that surprising. Kubernets is f'ing sweet though, thanks for that!
FYI, it's not just in GKE. I ran into this in Minikube, v1.9.0
Update: This could be the same issue as https://github.com/GoogleCloudPlatform/google-cloud-ruby/issues/1860 — I'm sporadically seeing this error when reproducing that issue, which happens when the Stackdriver Debugger is enabled and you are running on a forking server (e.g. Puma in clustered mode, or some configurations of Passenger). The workaround is to disable the debugger, with a configuration like this:
config.google_cloud.use_debugger = false
Deep apologies for the lack of motion on this. I know this has been a frustrating issue for many people, including us internally. I'm currently investigating a shim for the debugger to prevent it from triggering the problem, which I think should make it go away for most people.
We released google-cloud-debugger 0.32.0 and stackdriver 0.15.0 with a fix that delays the debugger startup until after workers are forked. If you have been observing this problem, please check against the new gems and see if it persists.
@dazuma Do you think we can close this issue now, and reopen if the problem is seen again on the new release?
Yes, let's go ahead and close this. As far as we can tell, this is a symptom of https://github.com/grpc/grpc/issues/7951. The debugger/stackdriver releases should work around the issue for the main use case that was triggering it (stackdriver on preforking webservers).
Most helpful comment
We released google-cloud-debugger 0.32.0 and stackdriver 0.15.0 with a fix that delays the debugger startup until after workers are forked. If you have been observing this problem, please check against the new gems and see if it persists.