collectd built from commit 66fd3303 crash with SIGSEGV

Created on 25 Oct 2017  路  10Comments  路  Source: collectd/collectd

  • Version of collectd:
    66fd3303352c526255ab4bde256361244e26b259

  • Operating system / distribution:
    16.04.1-Ubuntu, Openstack deployed by kolla-ansible

Expected behavior

Collectd container is up and running

Actual behavior

Collectd container restarting all the time due to crash

When I switch collectd back to older commit id 1b10ab706f8b70ce2f086e59a54cc09d671ad989 it is stable again

Steps to reproduce

  • build collectd docker container
  • deploy collectd with kolla-ansible
  • see dmesg
Bug Pending contributor action

All 10 comments

Thanks for reporting this @jiriproX! 1b10ab7..66fd330 is a range of 203 commits, do you think you could use git bisect to find the culprit? Also a stack trace would be super helpful! Our wiki has instructions.

Best regards,
鈥攐cto

... Also, configuration may be helpful too.

FQDNLookup false

LoadPlugin logfile
<Plugin logfile>
        LogLevel info
        File "/var/log/kolla/collectd/collectd.log"
        Timestamp true
        PrintSeverity true
</Plugin>

LoadPlugin cpu
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin hugepages

LoadPlugin csv
<Plugin csv>
   DataDir "/var/log/kolla/collectd/csv"
   StoreRates false
</Plugin>

LoadPlugin virt
<Plugin virt>
    Connection "qemu:///system"
    RefreshInterval 60
    HostnameFormat uuid
</Plugin>

<LoadPlugin ovs_events>
    Interval 1
</LoadPlugin>

<Plugin "ovs_events">
    Port 6640
    Socket "/var/run/openvswitch/db.sock"
#    Interfaces "br0" "veth0"
    SendNotification false
    DispatchValues true
</Plugin>

<LoadPlugin ovs_stats>
    Interval 1
</LoadPlugin>

<Plugin ovs_stats>
    Port "6640"
    Address "127.0.0.1"
    Socket "/var/run/openvswitch/db.sock"
#    Bridges "br0" "br_ext"
</Plugin>

<LoadPlugin python>
    Globals true
</LoadPlugin>
<Plugin python>
  #ModulePath "/collectd-gnocchi-plugin"
  #LogTraces true
  #Interactive false
  Import "collectd_gnocchi"
  <Module collectd_gnocchi>
     ### Keystone authentication
     # User_Id admin
     # Project_Id admin
     # Tenant_Id admin
     # User_Domain_Name default
     # Project_Domain_Name default
     Auth_Mode keystone
     Auth_Url "http://10.11.26.254:35357/v3"
     Username gnocchi
     Project_Name service
     Tenant_Name service
     Password G9ihscikcn7BTj3OcqQQf6VMUsQwvyfz5vZVHbIt
     User_Domain_Id default
     Project_Domain_Id default

     # Region_Name regionOne
     # Interface public
     # Endpoint http://localhost:8041 # if you want to override Keystone value


     ## Default resource type created by the plugin in Gnocchi
     ## to store hosts
     ResourceType collectd


     ## Minimum number of values to batch
     BatchSize 1
  </Module>
</Plugin>
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/collectd -f -C /etc/collectd/collectd.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __strncpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S:296
296     ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: No such file or directory.
[Current thread is 1 (Thread 0x7fffd1ffb700 (LWP 31))]
(gdb) backtrace full
#0  __strncpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S:296
No locals.
#1  0x000000000041b451 in strncpy (__len=__len@entry=128, __src=<optimized out>, __dest=0x7fffc4000f70 "", __dest@entry=0x7fffc4000f90 "")
    at /usr/include/x86_64-linux-gnu/bits/string3.h:126
No locals.
#2  sstrncpy (dest=dest@entry=0x7fffc4000f70 "", src=<optimized out>, n=n@entry=128) at src/daemon/common.c:81
No locals.
#3  0x00000000004111a0 in plugin_value_list_clone (vl_orig=vl_orig@entry=0x7fffd1ffa7d0) at src/daemon/plugin.c:702
        vl = 0x7fffc4000f50
#4  0x00000000004112b3 in plugin_write_enqueue (vl=0x7fffd1ffa7d0) at src/daemon/plugin.c:754
        q = 0x7fffc4000f20
#5  plugin_dispatch_values (vl=0x7fffd1ffa7d0) at src/daemon/plugin.c:2093
        statistics_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0,
              __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}

If I remove all plugins from collectd.conf except logfile plugin and python plugin it stops crashing. If I add any additional plugin there it starts crashing again.

It seams to be reason for crash:
[debug] hostname_g = (null);

Thanks for that finding!

cc @SeanCampbell
Related to #2467

It seems hostname_g left uninitialised because of FQDNLookup false set and missing explicit Hostname configured.

Caused by https://github.com/collectd/collectd/commit/69a2285dea4568c0010f116d22415f301b74579a#diff-6bbb0da0e748a6d60b63a37e8c8da457R97 change.

Possible fix:

--- a/src/daemon/collectd.c
+++ b/src/daemon/collectd.c
@@ -100,8 +100,10 @@ static int init_hostname(void) {
   }

   str = global_option_get("FQDNLookup");
-  if (IS_FALSE(str))
+  if (IS_FALSE(str)) {
+    hostname_set(hostname);
     return 0;
+  }

   struct addrinfo ai_hints = {.ai_flags = AI_CANONNAME};

Awesome work, thanks @rpv-tomsk!

Was this page helpful?
0 / 5 - 0 ratings