Warehouse: Audit trail: implement auditable event logging for sensitive actions

Created on 16 May 2019 · 10Comments · Source: pypa/warehouse

Warehouse is adding an advanced audit trail of user actions beyond the current (existing) journal. This will, for instance, allow publishers to track all actions taken by third party services on their behalf.

[x] Add auditing for user actions in PyPI
[x] Add auditing for project actions in PyPI
[x] Implement a User view for User auditing, allowing publishers to track all actions
taken by third party services on their behalf
[x] Implement a Project view for Project auditing for project maintainers to audit
actions similarly
[x] Implement an Admin view for PyPI.org administrators to audit actions similarly

So:

Each user will be able to view a log of sensitive actions performed that are relevant to their user account.
Each user who maintains at least one project on PyPI will be able to view a log of sensitive actions
(performed by ANY user) relevant to projects they act in the Owner Role on.
And PyPI administrators will be able to view the full audit log.

We'll be working on this in 2019. The Packaging Working Group, seeking donations and further grants to fund more work, got some new funding from the Open Technology Fund, and the audit log is part of the current grant-funded project.

CSSCSS HTML High priority UUI admin documentation feature request javascript

Source

brainwane

Most helpful comment

I'm not saying the following is necessarily correct, but they should provide a way to reason about this. Hope this helps!

trishankatdatadog on 8 Jun 2019

❤3

All 10 comments

Noting here that sensitive actions worth including in the event log would include _renaming_ a project, per #1919.

brainwane on 16 May 2019

Ok, @woodruffw gave me a clue, and asked me to look at all the metrics calls, like so:

$ git grep -C1 self._metrics
warehouse/accounts/services.py-        )
warehouse/accounts/services.py:        self._metrics = metrics
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-
warehouse/accounts/services.py:        self._metrics.increment("warehouse.authentication.start", tags=tags)
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-            logger.warning("Global failed login threshold reached.")
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.ratelimited",
--
warehouse/accounts/services.py-            if not self.ratelimiters["user"].test(user.id):
warehouse/accounts/services.py:                self._metrics.increment(
warehouse/accounts/services.py-                    "warehouse.authentication.ratelimited",
--
warehouse/accounts/services.py-
warehouse/accounts/services.py:                self._metrics.increment("warehouse.authentication.ok", tags=tags)
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-            else:
warehouse/accounts/services.py:                self._metrics.increment(
warehouse/accounts/services.py-                    "warehouse.authentication.failure",
--
warehouse/accounts/services.py-        else:
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.failure", tags=tags + ["failure_reason:user"]
--
warehouse/accounts/services.py-        tags = tags if tags is not None else []
warehouse/accounts/services.py:        self._metrics.increment("warehouse.authentication.two_factor.start", tags=tags)
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-            logger.warning("Global failed login threshold reached.")
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.two_factor.ratelimited",
--
warehouse/accounts/services.py-        if not self.ratelimiters["user"].test(user_id):
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.two_factor.ratelimited",
--
warehouse/accounts/services.py-        if totp_secret is None:
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.two_factor.failure",
--
warehouse/accounts/services.py-        if valid:
warehouse/accounts/services.py:            self._metrics.increment("warehouse.authentication.two_factor.ok", tags=tags)
warehouse/accounts/services.py-        else:
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.two_factor.failure",
--
warehouse/accounts/services.py-        self._api_base = api_base
warehouse/accounts/services.py:        self._metrics = metrics
warehouse/accounts/services.py-        self._help_url = help_url
--
warehouse/accounts/services.py-    def _metrics_increment(self, *args, **kwargs):
warehouse/accounts/services.py:        self._metrics.increment(*args, **kwargs)
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-
warehouse/accounts/services.py:        self._metrics_increment("warehouse.compromised_password_check.start", tags=tags)
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-            logger.warning("Error contacting HaveIBeenPwned: %r", exc)
warehouse/accounts/services.py:            self._metrics_increment(
warehouse/accounts/services.py-                "warehouse.compromised_password_check.error", tags=tags
--
warehouse/accounts/services.py-            if hashed_password[5:] == possible.lower():
warehouse/accounts/services.py:                self._metrics_increment(
warehouse/accounts/services.py-                    "warehouse.compromised_password_check.compromised", tags=tags
--
warehouse/accounts/services.py-        # If we made it to this point, then the password is safe.
warehouse/accounts/services.py:        self._metrics_increment("warehouse.compromised_password_check.ok", tags=tags)
warehouse/accounts/services.py-        return False
--
warehouse/rate_limiting/__init__.py-                logging.warning("Error computing rate limits: %r", exc)
warehouse/rate_limiting/__init__.py:                self._metrics.increment(
warehouse/rate_limiting/__init__.py-                    "warehouse.ratelimiter.error", tags=[f"call:{fn.__name__}"]
--
warehouse/rate_limiting/__init__.py-        self._identifiers = identifiers
warehouse/rate_limiting/__init__.py:        self._metrics = metrics
warehouse/rate_limiting/__init__.py-

From the metrics available so far, I will say that:

Maintainers should not see compromised password checks for Owners and other Maintainers, only themselves.
Maintainers should not see rate limit messages for Owners and other Maintainers, only themselves.
Maintainers should not see MFA authentication messages for Owners and other Maintainers, only themselves.

trishankatdatadog on 8 Jun 2019

I'm not saying the following is necessarily correct, but they should provide a way to reason about this. Hope this helps!

trishankatdatadog on 8 Jun 2019

❤3

(Here's a textual version of that image above.)

brainwane on 8 Jun 2019

I want to propose also flagging projects as spam/malware by the community. Self-regulation is good thing and some rules we can take from stack overflow community. This will greatly help the administrators

eirnym on 9 Jun 2019

@eirnym Thanks for the suggestion -- we're tracking that feature request as #3896.

brainwane on 10 Jun 2019

👍1

This work is part of the milestone of work we're doing for the Open Tech Fund-supported security work. Right now, as I understand it, @woodruffw and @nlhkabu are working on the API key work, but after that, they'll be working on this issue.

brainwane on 23 Jun 2019

User exarkun in IRC just pointed out that the public might want to know _who_ uploaded a particular release, and that info perhaps should be in the ownership history/audit log for a project.

brainwane on 25 Jun 2019

Warehouse maintainers and @woodruffw discussed design and scope for this feature in a meeting today.

_Design:_ We chose a basic approach for the auditing API: a new API, called in tandem with metrics calls in places where both are needed/desired.

_Storage:_ We have some research to do into how long we will retain user-related events (#3532), but we aim to retain project-related events permanently. We discussed a more flexible storage medium than Postgres (maybe a document store?) and started thinking about what kinds of data we'll need to store.

_Scope_: To avoid running over our funds for this project, we're limiting the scope for what we'll accomplish right now. We'll focus on:

implementing the necessary API calls internally
implementing storage
implementing a handful of auditable events
creating a few views for those, and exposing what makes sense today (e.g.: recent logins, project change events, and 2FA events)
- we may want to show most project-related events to all users, but hide some info about each event from users who aren't maintainers or owners, e.g. IP. @nlhkabu is taking note of this for designing templates.

Right now we want to ensure that we can store and retrieve events. We can add more auditable events down the line.

_Timing:_ This is the first thing we're working on in August, and we are aiming to get it merged and deployed in August.

brainwane on 1 Aug 2019

Began work on this in #6339.