Securedrop: After unattended update, can't connect to SecureDrop app

Created on 14 Aug 2017  ·  27Comments  ·  Source: freedomofpress/securedrop

Bug

Description

Overnight one of the unattended upgrades left the SecureDrop app server unable to be reached from Tor Browser.

Steps to Reproduce

SecureDrop 0.4.1

  1. SecureDrop had been running upgrades flawlessly.
  2. An unattended upgrade last night seems to have caused the break.

Expected Behavior

Can access SecureDrop from Tor Browser.

Actual Behavior

Unable to connect.

Comments

  • I've tried restarting the servers twice.
  • I can still ssh in to both mon & app from the Admin workstation.
  • Searching issues I came across #1078

From the OSSEC logs around the time it became unreachable:

Rule: 100012 fired (level 7) -> "Apparmor denied event"
Portion of the log(s):

Aug 14 XX:00:XX app kernel: [   36.599669] audit: type=1400 audit(XXX): apparmor="DENIED" operation="capable" pid=XXXX comm="apache2" capability=1  capname="dac_override"


Rule: 1002 fired (level 2) -> "Unknown problem somewhere in the system."
Portion of the log(s):

Aug 14 XX:01:XX app Tor[XXXX]: Onion service connection to [scrubbed] failed (connection refused)
bug priorithigh

Most helpful comment

Turns out this breaking bug was introduced as part of the updated package for tor version 0.3.0.10-1~trusty+1 in repo deb.torproject.org/torproject.org. Specifically, when I pulled down the two packages and compared them you can see this apparmor capability change:

diff /tmp/{tor_0.3.0.9-1~trusty,tor-0.3.0.10-1}/etc/apparmor.d/abstractions/tor                   
10c10
<   capability dac_override,
---
>   capability dac_read_search,

this affected us because our apache2 apparmor ruleset is calling in the tor rules here:

/usr/sbin/apache2 {
  #include <abstractions/base>
  #include <abstractions/tor>

So it turns out all along we were using dac_override 👍 and quick fix solution should indeed be to add it back and then re-evaluate if we should rip it out and how.

All 27 comments

Can confirm this also affects our setup.
Same situation, we are able to ssh in from the Admin Workstation, but not able to connect to either Source or Journalist Interface.

Checked a couple of the instances listed in the SecureDrop directory, and many of them are also down.

Can add that upgrades to tor around the same time might also be involved.

From the OSSEC log alerts we see changes to tor related files/packages:

Rule: 550 fired (level 7) -> Integrity checksum changed.
Portion of the log(s):
Integrity checksum changed for: /etc/apparmor.d/abstractions/tor
--END OF NOTIFICATION

Rule: 2902 fired (level 7) -> New dpkg (Debian Package) installed.
Portion of the log(s):
2017-08-14 XX:36:XX status installed tor:amd64 0.3.0.10-1~trusty+1
--END OF NOTIFICATION

Rule: 2902 fired (level 7) -> New dpkg (Debian Package) installed.
Portion of the log(s):
2017-08-14 XX:36:XX status installed tor-geoipdb:all 0.3.0.10-1~trusty+1
--END OF NOTIFICATION

Rule: 552 fired (level 7) -> Integrity checksum changed again (3rd time).
Portion of the log(s):
Integrity checksum changed for: /etc/apparmor.d/cache/usr.sbin.tor
--END OF NOTIFICATION

Rule: 552 fired (level 7) -> Integrity checksum changed again (3rd time).
Portion of the log(s):
Integrity checksum changed for: /etc/apparmor.d/cache/system_tor

Rule: 550 fired (level 7) -> Integrity checksum changed.
Portion of the log(s):
Integrity checksum changed for: /etc/apparmor.d/abstractions/tor

Same here. Logged in the app server now trying to figure out what's what. I'm on https://gitter.im/freedomofpress/securedrop should people want to chat about it.

/var/log/dpkg.log on the app server says:

2017-08-14 04:14:54 status half-configured tor:amd64 0.3.0.10-1~trusty+1
2017-08-14 04:14:54 status installed tor:amd64 0.3.0.10-1~trusty+1
2017-08-14 04:14:54 configure tor-geoipdb:all 0.3.0.10-1~trusty+1 
2017-08-14 04:14:54 status unpacked tor-geoipdb:all 0.3.0.10-1~trusty+1
2017-08-14 04:14:55 status half-configured tor-geoipdb:all 0.3.0.10-1~trusty+1
2017-08-14 04:14:55 status installed tor-geoipdb:all 0.3.0.10-1~trusty+1
sudo bash
source /etc/apache2/envvars
strace -s 256 /usr/sbin/apache2 -X |& less

got

... open("/var/lock/apache2/rewrite-map.3809", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC) = permission denied
... Permission denied: AH00023: Couldn't create the rewrite-map mutex (file /var/lock/apache2/rewrite-map.3809)
... AH00016: Configuration failed
...

although this is written on file descriptor 2, it does not show when running just /usr/sbin/apache2.
The permissions of /var/lock/apache2 are 755 and the owner is www-data, group root.

I confirm that I also see a message similar to

Aug 14 XX:00:XX app kernel: [   36.599669] audit: type=1400 audit(XXX): apparmor="DENIED" operation="capable" pid=XXXX comm="apache2" capability=1  capname="dac_override"

and even after a reboot. And if I run

sudo bash
source /etc/apache2/envvars
/usr/sbin/apache2 -X 

it silently fails (see above) but no other audit event shows in the output of dmesg.

I manually added

capability dac_override,

right after

capability kill,

to the file /etc/apparmor.d/usr.sbin.apache2, rebooted and the app are back online.

Note that it only shows what the error is and it may not be the right fix and I certainly do not fully understand why it happened. I'm not worried about my SecureDrop instance because it is not in production. But I don't recommend trying the same on a production setup ;-)

What eludes me is that the only upgrades were for the tor and tor-geodbip packages. Only tor has apparmor.d packages and they are not changing anything for /var/lock/apache2. How can this influence apache2 in any way ?

Another workaround is to chmod g+w /var/lock/apache2 so that root is allowed to create the /var/lock/apache2/rewrite-map.* file because apache2 tries to create it while still root. But it only persists until the next reboot: the directory is created with 755 permissions by default.

http://danwalsh.livejournal.com/69478.html is an interesting read

Can confirm The New York Times is also seeing this issue.

to expand -- this is also affecting new installs and not just upgrades

Just sent out the following the announcement to all SecureDrop Admins onboarded on our support portal:

All SecureDrop instances are currently down, due to an upgrade conflict related to the Apache service. The issue was first reported to us a few hours ago, and our team is working on a fix. Once we have identified and tested a resolution, we will release a new version of SecureDrop, 0.4.2, and publish it for automatic installation by all SecureDrop servers.

In the meantime, the Source and Journalist Interfaces for your instance will remain inaccessible. No action is required on your part in order to recover the instance. We are notifying you that we're working on the problem so that you can inform the journalists in your environment about the downtime.

We hope to have the issue diagnosed and patched shortly. Once the new version is published, it can take your server 24-48 hours to apply it, depending on timezone and exact release time.

If you're interested in the technical details, we're tracking progress in this GitHub issue: https://github.com/freedomofpress/securedrop/issues/2105

We'll update you again once this issue is resolved.

And now onto testing.

Can confirm that WIRED has this issue too.

The dac_override exemption is rather broad, but definitely brings the web apps back up. The dac_override capability was removed as part of the 0.3.5 release in #1058, but we don't have firm documentation of _why_ it was removed, other than that during testing it did not appear to be necessary. Indeed, it hasn't been to date, but suddenly last night that situation changed.

Fortunately, granting capability dac_override does not render moot the whitelist-based filepath approach used in the AppArmor profile to confine the Apache process. To confirm, try the following:

  1. Comment out the r declaration /var/www/securedrop/source.py
  2. Reload the profile
  3. Navigate to the Source Interface over Tor
  4. Observe HTTP 500.
  5. Observe AppArmor DENIED event, e.g. apparmor="DENIED" operation="open" profile="/usr/sbin/apache2" name="/var/www/securedrop/source.py" pid=11949 comm="apache2" requested_mask="r" denied_mask="r" fsuid=33 ouid=33

Additionally, running aa-logprof immediately after the reported failures shows that AppArmor suggests granting the new capability. (See our dev docs for more info on the logprof workflow.) So it sounds like the patch proposed by @dachary is indeed the way to go here.

Have a bit more research to do as far as what changed in the apache (or other) packages that suddenly requires this change, and if no major blockers there, I'll prepare a PR for a hotfix release.

Turns out this breaking bug was introduced as part of the updated package for tor version 0.3.0.10-1~trusty+1 in repo deb.torproject.org/torproject.org. Specifically, when I pulled down the two packages and compared them you can see this apparmor capability change:

diff /tmp/{tor_0.3.0.9-1~trusty,tor-0.3.0.10-1}/etc/apparmor.d/abstractions/tor                   
10c10
<   capability dac_override,
---
>   capability dac_read_search,

this affected us because our apache2 apparmor ruleset is calling in the tor rules here:

/usr/sbin/apache2 {
  #include <abstractions/base>
  #include <abstractions/tor>

So it turns out all along we were using dac_override 👍 and quick fix solution should indeed be to add it back and then re-evaluate if we should rip it out and how.

Excellent debugging @msheiny, agreed that we should explicitly grant dac_override in the apache2 profile, since it's been there all along and just fell off the map last night. Will follow up on #2106 to discuss mitigations down the road.

Thanks for finding the source of the problems!

Here in Europe (CEST timezone) we are nearing night again.
You described the release of a hotfix upgrade like this:

We hope to have the issue diagnosed and patched shortly. Once the new version is published, it can take your server 24-48 hours to apply it, depending on timezone and exact release time.

If you have released an upgrade during our sleeping hours – are there any way to kickstart the upgrade process manually when I get up tomorrow morning?

@byeskille you can manually edit as described at https://github.com/freedomofpress/securedrop/issues/2105#issuecomment-322186320 and reboot, it will come back. And will upgrade when the package is available.

Working on getting a release out now, just want to be thorough with the QA. We'll announce the release via the SecureDrop blog and Twitter. As soon as the new release has been published, a simple cron-apt -i -s on the apt server will pull in the new packages, so you don't have to wait for the nightly cron job to run.

@conorsch: NYT didn't receive the announcement. Please add us to the list? And while I'm here; why not announce the release via the same channel?

@runasand Thanks for reporting. The message went out via our support portal, will circle back and investigate why you didn't receive the notice once this release is out the door.

0.4.2 is live: https://securedrop.org/news/securedrop-042-released Thanks to @sighmon, @dachary, and @msheiny for making this release happen to quickly!

As mentioned in the blog post, running sudo cron-apt -i -s on the Application Server will install the fix immediately, so you don't have to wait for the unattended-upgrades window.

@conorsch: NYT didn't receive the announcement. Please add us to the list?

Can't find any problems on our end, I see the ticket in our support portal, opened about 8 hours ago. Check your inbox again!

And while I'm here; why not announce the release via the same channel?

We _do_ use the Redmine portal to announce (and pre-announce) planned releases. In the past we've used the blog (which has RSS) and Twitter to make these announcements, and that's still true today, but we duplicate the messages on the support portal as well, to provide push notifications to Admins as described in #1589.

I confirm tonight upgrade + reboot happened tonight and the source/journalist interfaces are up and running with SecureDrop 0.4.2

Same here: Can confirm our instance did the upgrade during the night to 0.4.2, and are again up and running.

Bloomberg News is also running again. Thanks
PS: I didn't receive an email from the portal.

WIRED is back.

Closing, as this issue is resolved by #2108. The outage is resolved - we'll follow up with any other instances experiencing downtime via the support portal. Long term we will need to mirror the tor apt repo and verify tor packages do not introduce breakage to prevent mass outages of this type in the future (ticket #2106).

Was this page helpful?
0 / 5 - 0 ratings