Graylog2-server: Appliance upgrade -reconfigure fails "add node to server list" ECONNREFUSED port 4001

Created on 4 Aug 2017  路  51Comments  路  Source: Graylog2/graylog2-server

Upgrading a previously upgraded appliance from 2.2.3 to 2.3.0.

During reconfigure step fails connecting to 127.0.0.1 port 4001

   - execute /opt/graylog/embedded/bin/graylog-ctl start graylog-server
  * ruby_block[add node to server list] action run

    ================================================================================
    Error executing action `run` on resource 'ruby_block[add node to server list]'
    ================================================================================

    Errno::ECONNREFUSED
    -------------------
    Connection refused - connect(2) for "127.0.0.1" port 4001

Expected Behavior



Completion

Current Behavior



Fails with long error message. Will attach stacktrace as requested.
chef-stacktrace.zip

Possible Solution


Steps to Reproduce (for bugs)


  1. With a previously upgraded appliance to 2.2.3
  2. wget https://packages.graylog2.org/releases/graylog-omnibus/ubuntu/graylog_latest.deb
  3. sudo graylog-ctl stop
  4. sudo dpkg -G -i graylog_latest.deb -- no errors noted on previous steps
  5. sudo graylog-ctl reconfigure -- fails during this.

Running reconfigure again results in same error message. Rebooting the appliance results in an nginx page.

Context


Your Environment


VM hosted on hyperv - been running for two years without issues - have upgraded in past without issue as well.

  • Graylog Version:
  • Elasticsearch Version:
  • MongoDB Version:
  • Operating System:
  • Browser version:
bug

Most helpful comment

@CaptainBobby if 2.3.0-2 is not working, you can try to delete the data stored in etcd. If this is a single host installation nothing will change, if you run a cluster you need to set the cluster-master again and run the according reconfigure-as-* command on the other nodes:

sudo rm -r /var/opt/graylog/data/etcd/*
sudo graylog-ctl reconfigure

All 51 comments

Last message from reconfigure is:

Running handlers:
[2017-08-04T13:50:00-06:00] ERROR: Running exception handlers
Running handlers complete
[2017-08-04T13:50:00-06:00] ERROR: Exception handlers complete
Chef Client failed. 6 resources updated in 01 minutes 17 seconds
[2017-08-04T13:50:00-06:00] FATAL: Stacktrace dumped to /opt/graylog/embedded/cookbooks/cache/chef-stacktrace.out
[2017-08-04T13:50:00-06:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2017-08-04T13:50:00-06:00] ERROR: ruby_block[add node to server list] (graylog::graylog-server line 84) had an error: Errno::ECONNREFUSED: Connection refused - connect(2) for "127.0.0.1" port 4001
[2017-08-04T13:50:00-06:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

Server is display nginx welcome page to users.

Services appear to be running:

ubuntu@graylog2:~$ sudo graylog-ctl status
run: elasticsearch: (pid 40760) 2s; run: log: (pid 964) 1252s
down: etcd: 0s, normally up, want up; run: log: (pid 966) 1252s
run: graylog-server: (pid 965) 1252s; run: log: (pid 963) 1252s
run: mongodb: (pid 962) 1252s; run: log: (pid 961) 1252s
run: nginx: (pid 40777) 2s; run: log: (pid 997) 1252s

show-config returns:

ubuntu@graylog2:~$ sudo graylog-ctl show-config
Starting Chef Client, version 12.6.0
Compiling Cookbooks...
{
  "graylog": {
    "bootstrap": {
      "bootstrapped": false
    },
    "etcd": {
      "enabled": true
    },
    "nginx": {
      "enabled": true
    },
    "mongodb": {
      "enabled": true
    },
    "elasticsearch": {
      "enabled": true
    },
    "graylog-server": null
  }
}

Converging 0 resources

Running handlers:
Running handlers complete
Chef Client finished, 0/0 resources updated in 01 seconds

I just noticed this:
down: etcd: 0s, normally up, want up; run: log: (pid 966) 1252s

a sudo graylog-ctl start didn't manage to get it running (probably due to broken configuration)

ubuntu@graylog2:~$ sudo graylog-ctl start
ok: run: elasticsearch: (pid 57662) 0s
timeout: down: etcd: 0s, normally up, want up
ok: run: graylog-server: (pid 965) 1826s
ok: run: mongodb: (pid 962) 1826s
ok: run: nginx: (pid 57878) 1s

@robdig Port 4001/tcp is etcd, which according to your last comment is down.

Try starting etcd (or restarting the VM) and then upgrade again.

Other than that, please check the logs of etcd for errors.

Same problem here. Can't get etcd to start. Log is here:

2017-08-07_12:33:37.48528 2017-08-07 08:33:37.485181 I | etcdmain: etcd Version: 3.2.4
2017-08-07_12:33:37.48561 2017-08-07 08:33:37.485222 I | etcdmain: Git SHA: c31bec0
2017-08-07_12:33:37.48620 2017-08-07 08:33:37.485225 I | etcdmain: Go Version: go1.8.3
2017-08-07_12:33:37.48731 2017-08-07 08:33:37.485228 I | etcdmain: Go OS/Arch: linux/amd64
2017-08-07_12:33:37.48756 2017-08-07 08:33:37.485231 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-08-07_12:33:37.48822 2017-08-07 08:33:37.485262 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-08-07_12:33:37.48865 2017-08-07 08:33:37.485456 I | embed: listening for peers on http://localhost:2380
2017-08-07_12:33:37.48950 2017-08-07 08:33:37.485494 I | embed: listening for client requests on 0.0.0.0:2379
2017-08-07_12:33:37.49003 2017-08-07 08:33:37.485519 I | embed: listening for client requests on 0.0.0.0:4001
2017-08-07_12:33:37.52042 2017-08-07 08:33:37.520381 I | etcdserver: recovered store from snapshot at index 22292229
2017-08-07_12:33:37.53947 2017-08-07 08:33:37.539432 C | etcdserver: recovering backend from snapshot error: database snapshot file path error: snap: snapshot file
doesn't exist
2017-08-07_12:33:37.54174 panic: recovering backend from snapshot error: database snapshot file path error: snap: snapshot file doesn't exist
2017-08-07_12:33:37.54217       panic: runtime error: invalid memory address or nil pointer dereference
2017-08-07_12:33:37.54330 [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xb5129c]
2017-08-07_12:33:37.54381
2017-08-07_12:33:37.54490 goroutine 1 [running]:
2017-08-07_12:33:37.54542 github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.NewServer.func1(0xc4201ba678, 0xc4201ba470)
2017-08-07_12:33:37.54617       /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserve
r/server.go:279 +0x3c
2017-08-07_12:33:37.54726 panic(0xd628e0, 0xc420155440)
2017-08-07_12:33:37.55066       /usr/local/go/src/runtime/panic.go:489 +0x2cf
2017-08-07_12:33:37.55114 github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc4201b0760, 0xf31eff, 0x2a, 0xc4201ba4e0, 0x1, 0x1)
2017-08-07_12:33:37.55220       /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x15c
2017-08-07_12:33:37.55276 github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.NewServer(0xc420222000, 0x0, 0x1402540, 0xc42021f7c0)
2017-08-07_12:33:37.55386       /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/server.go:374 +0x2e39
2017-08-07_12:33:37.55479 github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/embed.StartEtcd(0xc420081c00, 0x0, 0x0, 0x0)
2017-08-07_12:33:37.55571       /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/embed/etcd.go:147 +0x7c0
2017-08-07_12:33:37.55627 github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcd(0xc420081c00, 0x6, 0xf0e97d, 0x6, 0x1)
2017-08-07_12:33:37.55724       /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain/etcd.go:186 +0x58
2017-08-07_12:33:37.55829 github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain.startEtcdOrProxyV2()
2017-08-07_12:33:37.55850       /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain/etcd.go:103 +0x15ba
2017-08-07_12:33:37.55951 github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain.Main()
2017-08-07_12:33:37.56057       /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdmain/main.go:39 +0x61
2017-08-07_12:33:37.56177 main.main()
2017-08-07_12:33:37.56202       /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/etcd/main.go:28 +0x20

@toddlindner could you please send me an archive with the files in /var/opt/graylog/data/etcd?

@mariussturm It's fairly large and I'm not sure if there is anything sensitive. Can I send a smaller slice?

@toddlindner the question is: is this really a bug in the omnibus package, we bumped the etcd version from 2.3.7 to 3.2.4 (maybe that jump was too huge in some cases). Or did your setup ran out of disk space at some point and your database got corrupted in the past? To figure this out I would take the database files and start a version of etcd in between those two. E.g. 3.0.x to see if that works any better.

You could do this yourself, download the linux 64bit binary from here: https://github.com/coreos/etcd/releases/tag/v3.0.17

Drop it to /opt/graylog/embedded/sbin/etcd and see if that instance starts with the files you have on disk.

@mariussturm Yes, etcd is starting now with version 3.0.17. But, port 80 is still serving nginx welcome pages.

Ok, interesting. You would need another reconfigure run to get all services up, I guess.

@toddlindner for further debugging reasons could you give me the output of find /var/opt/graylog/data/etcd - nothing sensitive should show up.

Confirmed that replacing the included etcd binary with version 3.0.17 after sudo dpkg -G -i graylog_latest.deb and before sudo graylog-ctl reconfigure works for me.

http://docs.graylog.org/en/2.3/pages/configuration/graylog_ctl.html#upgrade-graylog-omnibus

Thanks!

@mariussturm
root@graylog:~# find /var/opt/graylog/data/etcd
/var/opt/graylog/data/etcd
/var/opt/graylog/data/etcd/member
/var/opt/graylog/data/etcd/member/snap
/var/opt/graylog/data/etcd/member/snap/0000000000000012-0000000001544e16.snap
/var/opt/graylog/data/etcd/member/snap/000000000000000e-0000000001542705.snap
/var/opt/graylog/data/etcd/member/snap/000000000000000e-000000000153d8e3.snap
/var/opt/graylog/data/etcd/member/snap/000000000000000e-000000000153b1d2.snap
/var/opt/graylog/data/etcd/member/snap/db
/var/opt/graylog/data/etcd/member/snap/000000000000000e-000000000153fff4.snap
/var/opt/graylog/data/etcd/member/wal
/var/opt/graylog/data/etcd/member/wal/0000000000000027-0000000001466ad4.wal
/var/opt/graylog/data/etcd/member/wal/0000000000000024-00000000012d59d3.wal
/var/opt/graylog/data/etcd/member/wal/0000000000000025-000000000135b4d3.wal
/var/opt/graylog/data/etcd/member/wal/0000000000000026-00000000013e0fd4.wal
/var/opt/graylog/data/etcd/member/wal/0.tmp
/var/opt/graylog/data/etcd/member/wal/0000000000000028-00000000014ec5d3.wal

After some more investigation I think the old snapshot files are the actual problem. Etcd changed the way snapshots are stored. 3.2 can't open these files any more and simply crashes while trying to open it. In the omnibus package 2.3.0-2 I have added a fix that cleans those files during the reconfigure run. That should fix the problem without going back to an older version of etcd.

I have the same issue.
Upgrade from Omnibus package graylog_2.2.3-2_amd64.deb to graylog_2.3.0-1_amd64.deb.
I didn't understand what are the steps to solve the issues.
Thank you for support

After upgrading to graylog_2.3.0-2_amd64.deb issue was solved. Maybe it's advisable to delete graylog_2.3.0-1_amd64.deb from omnibus repository

@janpapas done!

Thank you!

I see you updated the docs for the omnibus upgrade, which is great. There's the new step:

$ sudo dpkg -G -i graylog_latest.deb11
$ sudo graylog-ctl backup-etcd
$ sudo graylog-ctl reconfigure

However, after running the dpkg step, the Graylog installer tells you to run the reconfigure step. You may want to mention running the etcd step here:

```ubuntu@graylog:~$ sudo dpkg -G -i graylog_latest.deb
(Reading database ... 83368 files and directories currently installed.)
Preparing to unpack graylog_latest.deb ...
You're about to install Graylog!
Unpacking graylog (2.3.0-2) over (2.3.0-1) ...
Graylog has been uninstalled!
Setting up graylog (2.3.0-2) ...
By installing this package, you accept the terms of the Oracle Binary Code License Agreement for the Java SE Platform Products and JavaFX, which can be found at http://www.oracle.com/technetwork/java/javase/terms/license/index.html

Thank you for installing Graylog!
The next step in the install process is to run:

sudo graylog-ctl reconfigure
```

After upgrading to 2.3.0-2, the reconfigure process will not complete successfully. Reverting etcd to 3.0.17 did not help. I ended up just reverting my snapshot.

Generated at 2017-08-09 08:37:38 -0500
Errno::ECONNREFUSED: ruby_block[add node to server list] (graylog::graylog-server line 84) had an error: Errno::ECONNREFUSED: Connection refused - connect(2) for "127.0.0.1" port 4001
/opt/graylog/embedded/lib/ruby/2.1.0/net/http.rb:879:in `initialize'
/opt/graylog/embedded/lib/ruby/2.1.0/net/http.rb:879:in `open'
/opt/graylog/embedded/lib/ruby/2.1.0/net/http.rb:879:in `block in connect'
/opt/graylog/embedded/lib/ruby/2.1.0/timeout.rb:75:in `timeout'
/opt/graylog/embedded/lib/ruby/2.1.0/net/http.rb:878:in `connect'
/opt/graylog/embedded/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
/opt/graylog/embedded/lib/ruby/2.1.0/net/http.rb:852:in `start'
/opt/graylog/embedded/lib/ruby/2.1.0/net/http.rb:1375:in `request'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/etcd-0.3.0/lib/etcd/client.rb:111:in `api_execute'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/etcd-0.3.0/lib/etcd/keys.rb:39:in `set'
/opt/graylog/embedded/cookbooks/graylog/libraries/registry.rb:17:in `set_master'
/opt/graylog/embedded/cookbooks/graylog/recipes/graylog-server.rb:86:in `block (2 levels) in from_file'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/provider/ruby_block.rb:35:in `call'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/provider/ruby_block.rb:35:in `block in action_run'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/mixin/why_run.rb:52:in `call'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/mixin/why_run.rb:52:in `add_action'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/provider.rb:175:in `converge_by'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/provider/ruby_block.rb:34:in `action_run'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/provider.rb:144:in `run_action'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/resource.rb:596:in `run_action'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/runner.rb:74:in `run_action'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/runner.rb:106:in `block (2 levels) in converge'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/runner.rb:106:in `each'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/runner.rb:106:in `block in converge'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/resource_collection/resource_list.rb:83:in `block in execute_each_resource'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/resource_collection/stepable_iterator.rb:116:in `call'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/resource_collection/stepable_iterator.rb:116:in `call_iterator_block'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/resource_collection/stepable_iterator.rb:104:in `iterate'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/resource_collection/resource_list.rb:81:in `execute_each_resource'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/runner.rb:105:in `converge'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/client.rb:658:in `block in converge'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/client.rb:653:in `catch'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/client.rb:653:in `converge'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/client.rb:692:in `converge_and_save'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/client.rb:271:in `run'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/application.rb:261:in `block in fork_chef_client'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/application.rb:249:in `fork'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/application.rb:249:in `fork_chef_client'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/application.rb:215:in `block in run_chef_client'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/local_mode.rb:44:in `with_server_connectivity'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/application.rb:203:in `run_chef_client'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/application/solo.rb:286:in `block in interval_run_chef_client'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/application/solo.rb:275:in `loop'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/application/solo.rb:275:in `interval_run_chef_client'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/application/solo.rb:253:in `run_application'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/lib/chef/application.rb:58:in `run'
/opt/graylog/embedded/lib/ruby/gems/2.1.0/gems/chef-12.6.0/bin/chef-solo:25:in `<top (required)>'
/opt/graylog/embedded/bin/chef-solo:23:in `load'
/opt/graylog/embedded/bin/chef-solo:23:in `<main>'

I'm also having issues with 2.3.0-2 (upgrading from 2.2.3-2). Etcd won't get back up during/after reconfigure.
Log from installation
Log from 'sudo graylog-ctl reconfigure'
Chef-stacktrace
and
Etcd log.
The machine is running in a vmware cluster.
I've tried to manually insert the older version of etcd as mentioned above, but that didn't change the result.

@kjonas65 Are you sure you're using version 2.3.0-2 of the Graylog omnibus package?

Your system seems to use etcd 3.2.4, but the Graylog omnibus package bundles etcd 3.0.17.

That's odd. The "installation"-log linked above, says it's 2.3.0-2. It is from graylog_latest.deb, which I downloaded earlier today.
I also tried inserting etcd manually between running dpkg and reconfigure, but that didn't help.
Is there a chance that etcd gets replaced during "graylog-ctl reconfigure"?

Same issue here also. Running Appliance, followed the directions to upgrade just this morning, wget, dpkg, backed up, reconfigured...

Port 80 is displaying the NGINX page, I receieve "Error: Running exception handlers, Running handlers complete.
"Chef client failed. 9 resourced updated in 01 minutes 20 seconds"
Stacktrace dumped to "...Stacktrace.out"
"Chef::Exceptions::ChildConvergeError....process exited unsuccessfully (exit code 1).

What can I provide to assist? Thank you!

I'm getting the same result as @kjonas65 after running dpkg. I also reverted my vm snapshot and retried with the suggestion to drop etcd 3.0.17 to /opt/graylog/embedded/sbin/etcd but got the same result.

@CaptainBobby if 2.3.0-2 is not working, you can try to delete the data stored in etcd. If this is a single host installation nothing will change, if you run a cluster you need to set the cluster-master again and run the according reconfigure-as-* command on the other nodes:

sudo rm -r /var/opt/graylog/data/etcd/*
sudo graylog-ctl reconfigure

@joschi are you sure the graylog omnibus package 2.3.0-2 is bundling etcd 3.0.17? I found that even after installing 2.3.0-2 it I had to manually replace etcd as 3.2.4 was installed.

@mariussturm that worked for me! Thanks!

@toddlindner You're right. The omnibus package 2.3.0-2 comes with etcd 3.2.4.

https://github.com/Graylog2/omnibus-graylog2/blob/2.3.0-2/config/software/etcd.rb#L2

@joschi Are you planning on building an omnibus package with etc 3.0.17? That should prevent future reports of this problem.

@toddlindner I'm not aware of any plans, but @mariussturm could probably give a definitive answer.

@toddlindner the rollback to 3.0.17 was just a suggestion during debugging of the issue. If the etcd database is in a non-corrupted state, omnibus package 2.3.0-2 will install and upgrade etcd without issues. So the answer to your question is, no we are not planing to downgrade etcd in future versions of the omnibus package.

@mariussturm - I am having the same issue as many on this forum, more particulary with your reply to @CaptainBobby. I ran the rm -r for etcd, and Recieved "No Such file or directory" I think my next steps will be to look at what the others in the forum are mentioning about replacing etcd 3.2.4. I will keep you posted with my findings.

@thestealth007 I had to run the following command:

sudo rm /var/opt/graylog/data/etcd/member

The logged in user is unable to access the directory even when using sudo so you have to specify the "member" file.

@CaptainBobby Thank you, that has taken all the errors out of running the "reconfigure". I had the appliance configured to use HTTPS, 80 seems to be putting out the "welcome to NGINX" and https doesn't seem to be responding. A quick NMAP scan of the host shows it as open. I am still digging through it though.

@CaptainBobby's suggestion also worked for me.

@CaptainBobby - After a reboot, I seem to be back up and running, here are the steps performed that worked for me and some others.

sudo rm -r /var/opt/graylog/data/etcd/member
sudo graylog-ctl reconfigure
sudo reboot

TCP/80 is still putting out the "welcome to nginx, however 443 is working and I am able to login normally.

Hi

I guest that this need to be reopened, I got the same issue today. This seems to be a regression.

I got the same issue installing latest version, 2.3.1-2. using last @thestealth007 steps worked for me.

Same issue upgrading 2.2 to 2.3.1-3

@robdig What commands did you execute to upgrade Graylog and what was their output?

same issue upgrading from 2.2.3 to 2.3,1 using the omnibus package.
Followed the doc to upgrade:

$ wget https://packages.graylog2.org/releases/graylog-omnibus/ubuntu/graylog_latest.deb
$ sudo graylog-ctl stop
$ sudo dpkg -G -i graylog_latest.deb
$ sudo graylog-ctl backup-etcd
$ sudo graylog-ctl reconfigure
$ sudo reboot

@JulioQc What was the version you started with? Did you upgrade the OVA/omnibus package over multiple versions of Graylog?

Went from 2.1 to 2.2.3 then 2.3.1 (current latest deb)
Ran 2.2.3 for a few days without issues. Tried 2.3.1 last week, gave up because of this issue, tried again today with same.
As suggested, had to clear the content of the etcd folder to get it working.

@mariussturm Maybe it would be useful to dump the contents of etcd before upgrading it to the latest version, clearing the working directories, and restoring the dump after upgrading etcd.

https://coreos.com/etcd/docs/latest/v2/admin_guide.html#disaster-recovery

@joschi make sense, will take a look!

Maybe this could be included in the graylog-ctl script and described in the update procedure

The problem is that you have to start etcd in the foreground once to initialize the restored database. See: http://docs.graylog.org/en/2.3/pages/configuration/graylog_ctl.html#restore-cluster-configuration
That's not so easy to do with graylog-ctl

Have those steps been tested after the upgrade to 2.3? I'd be worried etcd service wont even start after the restoration process.
If it works, seems like a fairly reasonable process to be included in the update procedure to 2.3

I think we should document it as optional in case etcd don't start after an upgrade. Doing it every time is maybe not what the user want. But yes it was tested and worked here.

yes, I second that suggestion of leaving it optional.
We gave up on 2.3 update due to numerous problems but if we decide to proceed at some point, it'll let you know if it worked. I'll assume it will since you tested it ;)

Adding a note to this issue, in case somebody else is presented with the "welcome to NGINX" message after an upgrade.

In my case [Ominbus packages, no HTTPS configured yet], the fix was to modify the /opt/graylog/conf/nginx/nginx.conf file, and re-add in the API section inside the server section:

      location /api/ {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_pass http://localhost:9000/api/;
      }

After doing that, I issued a graylog-ctl restart and the service came up as expected.

If you're running HTTPS, the API block belongs in the server block that defines your HTTPS config.

Was this page helpful?
0 / 5 - 0 ratings