Scylla: Failed to restart/start

Created on 27 Jan 2018  Â·  23Comments  Â·  Source: scylladb/scylla

Installation details
Scylla version (or git commit hash): 2.0.2-0.20171201.07b039f
Cluster size: 1
OS (RHEL/CentOS/Ubuntu/AWS AMI): CentOS

Hardware details (for performance issues) Delete if unneeded
Platform (physical/VM/cloud instance type/docker): pysical
Hardware: sockets=1 cores=4 hyperthreading=8 memory=32GB
Disks: (SSD/HDD, count) HDD 1

I installed ScyllaDB on a fresh CentOS server. Everything worked fine. Server was up and running. I restarted the services and it failed. I uninstall ScyllaDB several times and remove all configs. I was hoping maybe a fresh install might fix this issue but it didn't. Below is the output of failure(I didn't touch configs):

[bebu@namooria ~]$ sudo service scylla-server status -l
Redirecting to /bin/systemctl status -l scylla-server.service

  • scylla-server.service - Scylla Server
    Loaded: loaded (/usr/lib/systemd/system/scylla-server.service; enabled; vendor preset: disabled)
    Active: failed (Result: exit-code) since Fri 2018-01-26 20:04:58 CST; 8s ago
    Process: 134195 ExecStopPost=/usr/lib/scylla/scylla_stop (code=exited, status=0/SUCCESS)
    Process: 134098 ExecStart=/usr/bin/scylla $SCYLLA_ARGS $SEASTAR_IO $DEV_MODE $CPUSET (code=exited, status=1/FAILURE)
    Process: 134094 ExecStartPre=/usr/lib/scylla/scylla_prepare (code=exited, status=0/SUCCESS)
    Main PID: 134098 (code=exited, status=1/FAILURE)

Jan 26 20:04:57 namooria scylla_prepare[134094]: already tuned: /sys/devices/pci0000:00/0000:00:01.0/0000:02:00.0/host0/target0:2:0/0:2:0:0/block/sda/queue/scheduler
Jan 26 20:04:57 namooria scylla_prepare[134094]: already tuned: /sys/devices/pci0000:00/0000:00:01.0/0000:02:00.0/host0/target0:2:0/0:2:0:0/block/sda/queue/nomerges
Jan 26 20:04:57 namooria scylla_prepare[134094]: tuning /sys/devices/virtual/block/dm-4
Jan 26 20:04:57 namooria scylla[134098]: Scylla version 2.0.2-0.20171201.07b039f starting ...
Jan 26 20:04:57 namooria scylla[134098]: [shard 0] init - Could not read configuration file /etc/scylla/scylla.yaml: YAML::TypedBadConversion > (yaml-cpp: error at line 0, column 0: bad conversion)
Jan 26 20:04:57 namooria scylla[134098]: [shard 0] seastar - Exiting on unhandled exception: YAML::TypedBadConversion > (yaml-cpp: error at line 0, column 0: bad conversion)
Jan 26 20:04:58 namooria systemd[1]: scylla-server.service: main process exited, code=exited, status=1/FAILURE
Jan 26 20:04:58 namooria systemd[1]: Failed to start Scylla Server.
Jan 26 20:04:58 namooria systemd[1]: Unit scylla-server.service entered failed state.
Jan 26 20:04:58 namooria systemd[1]: scylla-server.service failed.

[bebu@namooria ~]$ journalctl -xe
No journal files were found.
-- No entries --

bug showstopper

Most helpful comment

A fixed yaml-cpp package is on its way to EPEL. Please help test it (and expedite its arrival in the stable repository) by following the instructions in https://groups.google.com/d/msg/scylladb-users/lqHMtL1wP8s/wIuerFz3AQAJ.

All 23 comments

Your config file has problem:

 Could not read configuration file /etc/scylla/scylla.yaml: YAML::TypedBadConversion<seastar::basic_sstring<char, unsigned int, 15u> > (yaml-cpp: error at line 0, column 0: bad conversion)

Can you check and upload your /etc/scylla/scylla.yaml here?

scylla.txt
It's a fresh install. I didn't change config. Please find attached config file. I changed file extension to attach it here.

This issue has similar error like I am getting.
https://github.com/scylladb/scylla/issues/3157

I replicated the same procedure (as described in my first post) on two other similar specs machines with CentOS 7.2 and have the same issue. Error is same. I followed installation steps described here http://www.scylladb.com/download/centos_rpm/

On one of the machines, I get following error as well:
Answer yes to automatically start Scylla when the node boots; answer no to skip this step.
[YES/no]yes
Created symlink from /etc/systemd/system/multi-user.target.wants/scylla-server.service to /usr/lib/systemd/system/scylla-server.service.
root is not in the sudoers file. This incident will be reported.

I am logged in as "su" though.

+1 same problem here after update on Centos 7 to scylla-server-2.0.2-0.20171201.07b039f.el7.centos.x86_64

Also a fresh install gives this error message.

UPDATE:

I did some more testing, first I did a downgrade back to scylla-2.0.0-0.20170929.e265c91.el7.centos but this didn't help (still the error about the yaml file).

After some investigation of my yum.log I found that the package yaml-cpp was updated to version 0.5.3-7 in the epel repository.

Unfortunately there is no older version any more available in the epel repo but I found one at http://cbs.centos.org/kojifiles/packages/yaml-cpp/0.5.1/6.el7/x86_64/yaml-cpp-0.5.1-6.el7.x86_64.rpm

When I install this version yaml-cpp-0.5.1-6.el7.x86_64 the error is gone and I can start scylla-server again.

When I do a yum update yaml-cpp the package upgraded again to yaml-cpp-0.5.3-7.el7.x86_64 and now I can't start scylla-server because of the error.

Hope this helps! (for now I keep the yaml-cpp package on the older version and put it in the yum exclude list.

Good detective work. Do you have non-default locale set up? I want to know if it's a combination of non-default locale and yaml-cpp-0.5.3 or just yaml-cpp-0.5.3.

Nothing special I think:

# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Well, I can't think of anything more default than that. So the failure is completely due to the change in yaml-cpp.

Thanks @JvGinkel ! Yes this was the problem. I downgraded to yaml-cpp-0.5.1-6.el7.x86_64 and Scylla installed and ran without any error. However, running "update" command will update yaml-cpp and it may cause issues on a production server.

Is there a way to restrict Scylla to only use specific version/code of yaml-cpp? OR restrict Centos to not update this package?

This might not be related to this issue but if I change "data_file_directories" to use "- /folde/data1" (folde is ext3). Scylla fails to load and fills /var/lib/systemd/coredum.

Cannot I use any ext3 mounted directory as "data_file_directories" ?
Why any simple failure fills coredum? Yaml error was causing filling coredum too.

Is there a way to restrict Scylla to only use specific version/code of yaml-cpp? OR restrict Centos to not update this package?

You can add the yaml-cpp package to the exclude setting in the /etc/yum.conf:

exclude=yaml-cpp

Filed bug against Fedora EPEL.

It looks like the problem is that yaml-cpp moved from using
boost::shared_ptr to std::shared_ptr as part of C++11-isation (commit
24fa1b33805c9a91df0f32c46c28e314dd7ad96f there). Unfortunately the
pointers are part of a data structures that are accessed from
templated and inlined code . If inlined and library code do not agree
about how data structure looks like things breaks down. Unfortunately
it means that yaml-cpp library has no real ABI and any upgrade can break
Scylla. The library should be linked statically.

--
Gleb.

EPEL is considering downgrading back to 0.5.1 (which would require us to rebuild our unreleased packages).

A fixed yaml-cpp package is on its way to EPEL. Please help test it (and expedite its arrival in the stable repository) by following the instructions in https://groups.google.com/d/msg/scylladb-users/lqHMtL1wP8s/wIuerFz3AQAJ.

A new yaml-cpp error exists in test:
Scylla version 666.development-0.20180201.4463e9071

2018-02-01 01:04:56,829 process          L0333 INFO | Running '/bin/yum -y update'2018-02-01 01:05:13,755 process          L0420 DEBUG| [stdout] ---> Package yaml-cpp.x86_64 0:0.5.1-6.el7 will be updated
2018-02-01 01:05:13,755 process          L0420 DEBUG| [stdout] ---> Package yaml-cpp.x86_64 1:0.5.1-1.el7.2 will be an update
2018-02-01 01:05:14,405 process          L0420 DEBUG| [stdout]  yaml-cpp                  x86_64 1:0.5.1-1.el7.2                epel     176 k
2018-02-01 01:07:09,031 process          L0420 DEBUG| [stdout]   Updating   : 1:yaml-cpp-0.5.1-1.el7.2.x86_64                          141/481 
2018-02-01 01:08:18,319 process          L0420 DEBUG| [stdout]   Cleanup    : yaml-cpp-0.5.1-6.el7.x86_64                              360/481 
2018-02-01 01:08:56,693 process          L0420 DEBUG| [stdout]   Verifying  : 1:yaml-cpp-0.5.1-1.el7.2.x86_64                          183/481 
2018-02-01 01:08:56,699 process          L0420 DEBUG| [stdout]   Verifying  : yaml-cpp-0.5.1-6.el7.x86_64                              277/481 
2018-02-01 01:08:56,728 process          L0420 DEBUG| [stdout]   yaml-cpp.x86_64 1:0.5.1-1.el7.2  
...
2018-02-01 01:18:18,894 process          L0420 DEBUG| [stdout] Feb 01 01:18:18 localhost.localdomain scylla[1860]:  [shard 0] init - Could not read configuration file /etc/scylla/scylla.yaml: YAML::TypedBadConversion<seastar::basic_sstring<char, unsigned int, 15u> > (yaml-cpp: error at line 254369, column 24577: bad conversion)
2018-02-01 01:18:18,894 process          L0420 DEBUG| [stdout] Feb 01 01:18:18 localhost.localdomain scylla[1860]:  [shard 0] seastar - Exiting on unhandled exception: YAML::TypedBadConversion<seastar::basic_sstring<char, unsigned int, 15u> > (yaml-cpp: error at line 254369, column 24577: bad conversion)

On Thu, Feb 01, 2018 at 02:45:02AM +0000, Amos Kong wrote:

A new yaml-cpp error exists in test:
Scylla version 666.development-0.20180201.4463e9071
Is this package compiled with yaml-cpp 0:0.5.3-7.el7?

--
Gleb.

On Thu, Feb 1, 2018 at 3:54 PM, Gleb Natapov notifications@github.com
wrote:

On Thu, Feb 01, 2018 at 02:45:02AM +0000, Amos Kong wrote:

A new yaml-cpp error exists in test:
Scylla version 666.development-0.20180201.4463e9071
Is this package compiled with yaml-cpp 0:0.5.3-7.el7?

The package was built by
http://jenkins.cloudius-systems.com:8080/job/scylla-master-centos-rpm/623

I just checked the build log, it used yaml-cpp-0.5.3-7.el7.x86_64. so we
need to fix our env?

http://jenkins.cloudius-systems.com:8080/job/scylla-master-centos-rpm/623/artifact/scylla/build/rpms/installed_pkgs.log
yaml-cpp-0.5.3-7.el7.x86_64 1515443641 556771
093fc25ac2577ac94e480308d9b8b81c installed

--

Gleb.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/3161#issuecomment-362186238,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAS5zKM36lUAuQXxVwoWOUs9RyOtrxT9ks5tQW2-gaJpZM4RvGt2
.

On Thu, Feb 01, 2018 at 08:07:03AM +0000, Amos Kong wrote:

On Thu, Feb 1, 2018 at 3:54 PM, Gleb Natapov notifications@github.com
wrote:

On Thu, Feb 01, 2018 at 02:45:02AM +0000, Amos Kong wrote:

A new yaml-cpp error exists in test:
Scylla version 666.development-0.20180201.4463e9071
Is this package compiled with yaml-cpp 0:0.5.3-7.el7?

The package was built by
http://jenkins.cloudius-systems.com:8080/job/scylla-master-centos-rpm/623

I just checked the build log, it used yaml-cpp-0.5.3-7.el7.x86_64. so we
need to fix our env?

The problem in the first place was that yaml-cpp is not binary
compatible between various versions, so the test should use the same
version the package was compiled with.

--
Gleb.

fixed by dc2b17b3dadcfa840927e4b5519dde4938c346b6 , 82f217d62afb7d5763a84939cc525fcbe1ea0633 , bec2b015e32344b1e4f5ead025ae53824d65d63b

Latest build : 666.development-0.20180205.6919c7434 (contains the fix) works well with yaml-cpp-0.5.1-1.el7.2.x86_64

Was this page helpful?
0 / 5 - 0 ratings