Scylla: [2.3 AMI] Scylla does not start after AWS instance stop/start

Created on 20 Sep 2018 · 28Comments · Source: scylladb/scylla

Steps to reproduce:

Provision a 3 node cluster through https://www.scylladb.com/download/#aws

I used this in the advanced details config:

--clustername MyCluster
--totalnodes 3

Added tags

user: moreno
Name: MyCluster

Stop AWS instances, wait a minute, Start AWS instances

Expected behavior: Data from ephemeral should be gone but Scyllad should come back fine.
Actual behavior: Scylla does not start in all 3 nodes.

systemd log:

-- The leading process of the session is 1544.
Sep 20 15:20:02 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Started Session 1 of user centos.
-- Subject: Unit session-1.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit session-1.scope has finished starting up.
-- 
-- The start-up result is done.
Sep 20 15:20:02 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Starting Session 1 of user centos.
-- Subject: Unit session-1.scope has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit session-1.scope has begun starting up.
Sep 20 15:20:02 ip-172-31-17-134.us-west-2.compute.internal sshd[1544]: pam_unix(sshd:session): session opened for user centos by (uid=0)
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Job dev-disk-by\x2duuid-9809eb30\x2dc990\x2d4c58\x2db6d7\x2d9f0ebb480e2a.device/start timed out.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-9809eb30\x2dc990\x2d4c58\x2db6d7\x2d9f0ebb480e2a.device.
-- Subject: Unit dev-disk-by\x2duuid-9809eb30\x2dc990\x2d4c58\x2db6d7\x2d9f0ebb480e2a.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit dev-disk-by\x2duuid-9809eb30\x2dc990\x2d4c58\x2db6d7\x2d9f0ebb480e2a.device has failed.
-- 
-- The result is timeout.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Dependency failed for /var/lib/scylla.
-- Subject: Unit var-lib-scylla.mount has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit var-lib-scylla.mount has failed.
-- 
-- The result is dependency.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Dependency failed for Scylla Server.
-- Subject: Unit scylla-server.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit scylla-server.service has failed.
-- 
-- The result is dependency.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Dependency failed for Scylla JMX.
-- Subject: Unit scylla-jmx.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit scylla-jmx.service has failed.
-- 
-- The result is dependency.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Job scylla-jmx.service/start failed with result 'dependency'.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Dependency failed for Run Scylla Housekeeping daily mode.
-- Subject: Unit scylla-housekeeping-daily.timer has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit scylla-housekeeping-daily.timer has failed.
-- 
-- The result is dependency.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Job scylla-housekeeping-daily.timer/start failed with result 'dependency'.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Dependency failed for Run Scylla Housekeeping restart mode.
-- Subject: Unit scylla-housekeeping-restart.timer has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit scylla-housekeeping-restart.timer has failed.
-- 
-- The result is dependency.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Job scylla-housekeeping-restart.timer/start failed with result 'dependency'.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Job scylla-server.service/start failed with result 'dependency'.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Job var-lib-scylla.mount/start failed with result 'dependency'.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Job dev-disk-by\x2duuid-9809eb30\x2dc990\x2d4c58\x2db6d7\x2d9f0ebb480e2a.device/start failed with result 'timeout'.
Sep 20 15:20:33 ip-172-31-17-134.us-west-2.compute.internal systemd[1]: Reached target Multi-User System.

aws-console.log

Source

gnumoreno

Most helpful comment

@avikivity @dorlaor @slivne @tzach
Moreno's point is that there is a big change in the 2.3 systemd scripts behaviour compared to previous releases:

Before 2.3: systemctl start scylla-server would recreate the raid and remount /var/lib/scylla directory and would eventually succeed even if the instance has been "stopped" before as Moreno described above.
2.3: it doesn't do it and as a result systemctl start scylla-server fails in the use case as described above.

This is a major change in the behaviour many people rely on. Could you clarify, please, if this is an intentional change?

vladzcloudius on 20 Sep 2018

👍3

All 28 comments

@roydahan don't we have a test for AWS instance stop/start?
@tzach I know some of our customers do stop/start on their dev environment to start fresh. I'm glad it happened to me so we can fix.
@penberg since you promoted the AMI's, tagging you FYI.

gnumoreno on 20 Sep 2018

2.3 is correct. The disks disappeared (ephemeral)?, so data was lost, and Scylla refuses to pretend everything is okay. It's up to you to restore the data or reformat the disk.

avikivity on 20 Sep 2018

@gnumoreno the raid was destroyed but Scylla's data dir is pointing to it. You cannot expect it to work or for Scylla to paper over it.

dorlaor on 20 Sep 2018

@avikivity @dorlaor @slivne @tzach
Moreno's point is that there is a big change in the 2.3 systemd scripts behaviour compared to previous releases:

Before 2.3: systemctl start scylla-server would recreate the raid and remount /var/lib/scylla directory and would eventually succeed even if the instance has been "stopped" before as Moreno described above.
2.3: it doesn't do it and as a result systemctl start scylla-server fails in the use case as described above.

This is a major change in the behaviour many people rely on. Could you clarify, please, if this is an intentional change?

vladzcloudius on 20 Sep 2018

👍3

It's a valid point but not a bug. I think the older behaviour was wrong

On Thu, Sep 20, 2018 at 12:26 PM vladzcloudius notifications@github.com
wrote:

@avikivity https://github.com/avikivity @dorlaor
https://github.com/dorlaor @slivne https://github.com/slivne @tzach
https://github.com/tzach
Moreno's point is that there is a big change in the 2.3 systemd scripts
behaviour compared to previous release:

Before 2.3: systemctl start scylla-server would recreate the raid
and remount /var/lib/scylla directory and would eventually succeed even if
the instance has been "stopped" before as Moreno described above.

2.3: it doesn't do it and as a result systemctl start scylla-server
fails in the use case as described above.

This is a major change in the behaviour many people rely on. Could you
clarify, please, if this is an intentional change?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/3775#issuecomment-423303522,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABp6RaJFsM02uyklprnE-IGgkzcTpZbnks5uc-vxgaJpZM4WyXlc
.

dorlaor on 20 Sep 2018

👍1

@dorlaor Could you clarify why the old behaviour was wrong, please?

I use the old behaviour myself a lot and find it very useful: I tune the instance once, shut it down and when I need it, I simply "start" it and it's ready for usage right away.
I'm pretty sure I'm not the only one who does that, including our customers.

On top of that I really don't understand why a user has to run scylla_setup again after he/she already did that once before? If I were a user I'd ask a legitimate question right away: "What's a technical difficulty to store the result of scylla_setup disk configuration on a persistent storage so that I don't have to run scylla_setup every time I "start" the instance?"

vladzcloudius on 20 Sep 2018

I believe 2.3 behavior is wrong.
As a user who decided to use ephemeral drives, I decided that I know how to handle disk and node failure. (i.e. I will have to sustain a node bootstrap when such problem happens).
If we decide 2.3 workflow is the normal behavior, we should communicate the change in operation flow to our users.

eyalgutkind on 20 Sep 2018

To me it make sense to explicitly create the drive. We shouldn't rely on
AWS or the user implicit changes

On Thu, Sep 20, 2018 at 1:01 PM vladzcloudius notifications@github.com
wrote:

@dorlaor https://github.com/dorlaor Could you clarify why, please?

I use the old behaviour myself a lot and find it very useful: I tune the
instance once, shut it down and when I need it, I simply "start" it and
it's ready for usage right away.
I'm pretty sure I'm not the only one who does that, including our
customers.

On top of that I really don't understand why a user has to run
scylla_setup again after he/she already did that once before? If I were a
user I'd ask a legitimate question right away: "What's a technical
difficulty to store the result of scylla_setup disk configuration on a
persistent storage so that I don't have to run scylla_setup every time I
"start" the instance?"

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scylladb/scylla/issues/3775#issuecomment-423313950,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABp6RZSzwrevdhRiGT_hwCpF9SRXr5FLks5uc_QygaJpZM4WyXlc
.

dorlaor on 20 Sep 2018

My view on this is that it is indeed better for Scylla not to auto-start after a full stop.

Ideally, we would create the RAID, just not start Scylla.But I don't see a sane way to differentiate between that and a clean reboot, so the current behavior is fine by me.

glommer on 20 Sep 2018

@glommer It's very simple really: run systemctl disable scylla-server and when you reboot scylla won't automatically start.

But as far as I understand this issue is not about that at all but rather about scylla-server not (re)creating a RAID partition and (re)mounting /var/lib/scylla when user explicitly asked it to be started with systemctl start scylla-server.
This is a very different behaviour from what we had prio-2.3 and I don't see any benefit in the new behaviour.

vladzcloudius on 20 Sep 2018

auto-starting after reboot is good behavior
auto-starting after stop/start is bad behavior.

Those things are different and there is no clear way to distinguish.
Not recreating the partition forces that distinction, since after a reboot the data will still be there - therefore everything can auto-start.

glommer on 20 Sep 2018

Is this change in behavior documented in the release notes? If not, it is a regression, which needs to either be addressed by means of documentation or a code change.

penberg on 21 Sep 2018

And please don't close this issue until: (1) there's a reference to the commit that changes the behavior and (2) justification and documentation on the change.

penberg on 21 Sep 2018

The old behavior results in silent data loss if the filesystem fails to mount. The new behavior requires that people that want to wipe out their cluster do things differently (like rm -rf /var/lib/scylla/*, or run scylla_raid_setup).

How can anyone argue for the old behavior? It's broken. Distinguish between the convenience of not changing some habits, and losing data. Both are important but not equally important.

avikivity on 21 Sep 2018

I am, at least, not arguing for either of the two behaviors. All I am asking is some reference to the commit(s) that changed the behavior for future reference. Furthermore, for user-visible change like this, a minimum would have been to mention it in the changelog. Since that ship has already sailed, perhaps some note on Scylla's documentation pages is in order?

penberg on 22 Sep 2018

👍1

Since that ship has already sailed, perhaps some note on Scylla's documentation pages is in order?

Yes, we need to add this to the release blog. A commit, as @penberg suggested will help.
For the future, we need to do a better job tracking and reporting user-facing changes like this.

tzach on 22 Sep 2018

The original behavior is of course wrong and as Avi said may lead to data loss.
It may be convenient for testing but may mislead customers that can stop and start such instance and think there was no impact.

roydahan on 23 Sep 2018

This was fixed (data loss prevented) in f475c65ae66f7b7df592392b647179e8ec917f9b.

avikivity on 23 Sep 2018

@avikivity @roydahan It would really help and shorten this discussion if, like @penberg has mentioned, somebody would mention a corresponding github issue.

And the issue in question is #3640.
Since we think that this is issue may end up with a data loss I think it may be not a bad idea to backport this - at least to the enterprise branch.

vladzcloudius on 24 Sep 2018

3640 was an intermediate issue that was not present in any release.

avikivity on 25 Sep 2018

@avikivity Then could you, please, reference the relevant issue?

vladzcloudius on 25 Sep 2018

I don't know if there was a relevant issue (and if there was, I wouldn't remember its number).

avikivity on 25 Sep 2018

@penberg @tzach @glommer @slivne @avikivity I'm a little confused.
Data loss seems to be an issue serious enough to be fixed in all supported versions. Do we plan to fix this (backport)?

vladzcloudius on 25 Sep 2018

The fix doesn't cure data loss, it just makes it non-silent. If you stop a cloud instance with ephemeral storage, you lost data (which you might be able to recover with repair/rebuild).

avikivity on 25 Sep 2018

@avikivity "Prevent a silent data loss" still sounds important enough, isn't it?

vladzcloudius on 25 Sep 2018

For the record, I just tried this myself.

The reason the procedure to bring back the node is so complex, is that it seems that upon the creation of the AMI we add the entry to fstab.

Once we recreate the raid, trying to mount it will fail. I did the creation through scylla_create_devices.
I didn't pass an option to update the fstab, nor do I feel like I should have: if we are going to build the AMI in a way that the raid won't be back, our initial creation invocation should not update the fstab entry.

glommer on 30 Sep 2018

There is a very easy procedure to avoid running things manually.

You can delete the file /etc/scylla/ami_configured and restart the instance.
It should treat the instance as never configured and will configure it from scratch.

roydahan on 2 Oct 2018

closing this issue based on avi inputs