Warehouse: Audit trail: implement auditable event logging for sensitive actions

Created on 16 May 2019  路  10Comments  路  Source: pypa/warehouse

Warehouse is adding an advanced audit trail of user actions beyond the current (existing) journal. This will, for instance, allow publishers to track all actions taken by third party services on their behalf.

  • [x] Add auditing for user actions in PyPI
  • [x] Add auditing for project actions in PyPI
  • [x] Implement a User view for User auditing, allowing publishers to track all actions
    taken by third party services on their behalf
  • [x] Implement a Project view for Project auditing for project maintainers to audit
    actions similarly
  • [x] Implement an Admin view for PyPI.org administrators to audit actions similarly

So:

  • Each user will be able to view a log of sensitive actions performed that are relevant to their user account.
  • Each user who maintains at least one project on PyPI will be able to view a log of sensitive actions
    (performed by ANY user) relevant to projects they act in the Owner Role on.
  • And PyPI administrators will be able to view the full audit log.

We'll be working on this in 2019. The Packaging Working Group, seeking donations and further grants to fund more work, got some new funding from the Open Technology Fund, and the audit log is part of the current grant-funded project.

CSSCSS HTML High priority UUI admin documentation feature request javascript

Most helpful comment

I'm not saying the following is necessarily correct, but they should provide a way to reason about this. Hope this helps!

image

image

All 10 comments

Noting here that sensitive actions worth including in the event log would include _renaming_ a project, per #1919.

Ok, @woodruffw gave me a clue, and asked me to look at all the metrics calls, like so:

$ git grep -C1 self._metrics
warehouse/accounts/services.py-        )
warehouse/accounts/services.py:        self._metrics = metrics
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-
warehouse/accounts/services.py:        self._metrics.increment("warehouse.authentication.start", tags=tags)
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-            logger.warning("Global failed login threshold reached.")
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.ratelimited",
--
warehouse/accounts/services.py-            if not self.ratelimiters["user"].test(user.id):
warehouse/accounts/services.py:                self._metrics.increment(
warehouse/accounts/services.py-                    "warehouse.authentication.ratelimited",
--
warehouse/accounts/services.py-
warehouse/accounts/services.py:                self._metrics.increment("warehouse.authentication.ok", tags=tags)
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-            else:
warehouse/accounts/services.py:                self._metrics.increment(
warehouse/accounts/services.py-                    "warehouse.authentication.failure",
--
warehouse/accounts/services.py-        else:
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.failure", tags=tags + ["failure_reason:user"]
--
warehouse/accounts/services.py-        tags = tags if tags is not None else []
warehouse/accounts/services.py:        self._metrics.increment("warehouse.authentication.two_factor.start", tags=tags)
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-            logger.warning("Global failed login threshold reached.")
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.two_factor.ratelimited",
--
warehouse/accounts/services.py-        if not self.ratelimiters["user"].test(user_id):
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.two_factor.ratelimited",
--
warehouse/accounts/services.py-        if totp_secret is None:
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.two_factor.failure",
--
warehouse/accounts/services.py-        if valid:
warehouse/accounts/services.py:            self._metrics.increment("warehouse.authentication.two_factor.ok", tags=tags)
warehouse/accounts/services.py-        else:
warehouse/accounts/services.py:            self._metrics.increment(
warehouse/accounts/services.py-                "warehouse.authentication.two_factor.failure",
--
warehouse/accounts/services.py-        self._api_base = api_base
warehouse/accounts/services.py:        self._metrics = metrics
warehouse/accounts/services.py-        self._help_url = help_url
--
warehouse/accounts/services.py-    def _metrics_increment(self, *args, **kwargs):
warehouse/accounts/services.py:        self._metrics.increment(*args, **kwargs)
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-
warehouse/accounts/services.py:        self._metrics_increment("warehouse.compromised_password_check.start", tags=tags)
warehouse/accounts/services.py-
--
warehouse/accounts/services.py-            logger.warning("Error contacting HaveIBeenPwned: %r", exc)
warehouse/accounts/services.py:            self._metrics_increment(
warehouse/accounts/services.py-                "warehouse.compromised_password_check.error", tags=tags
--
warehouse/accounts/services.py-            if hashed_password[5:] == possible.lower():
warehouse/accounts/services.py:                self._metrics_increment(
warehouse/accounts/services.py-                    "warehouse.compromised_password_check.compromised", tags=tags
--
warehouse/accounts/services.py-        # If we made it to this point, then the password is safe.
warehouse/accounts/services.py:        self._metrics_increment("warehouse.compromised_password_check.ok", tags=tags)
warehouse/accounts/services.py-        return False
--
warehouse/rate_limiting/__init__.py-                logging.warning("Error computing rate limits: %r", exc)
warehouse/rate_limiting/__init__.py:                self._metrics.increment(
warehouse/rate_limiting/__init__.py-                    "warehouse.ratelimiter.error", tags=[f"call:{fn.__name__}"]
--
warehouse/rate_limiting/__init__.py-        self._identifiers = identifiers
warehouse/rate_limiting/__init__.py:        self._metrics = metrics
warehouse/rate_limiting/__init__.py-

From the metrics available so far, I will say that:

  1. Maintainers should not see compromised password checks for Owners and other Maintainers, only themselves.
  2. Maintainers should not see rate limit messages for Owners and other Maintainers, only themselves.
  3. Maintainers should not see MFA authentication messages for Owners and other Maintainers, only themselves.

I'm not saying the following is necessarily correct, but they should provide a way to reason about this. Hope this helps!

image

image

I want to propose also flagging projects as spam/malware by the community. Self-regulation is good thing and some rules we can take from stack overflow community. This will greatly help the administrators

@eirnym Thanks for the suggestion -- we're tracking that feature request as #3896.

This work is part of the milestone of work we're doing for the Open Tech Fund-supported security work. Right now, as I understand it, @woodruffw and @nlhkabu are working on the API key work, but after that, they'll be working on this issue.

User exarkun in IRC just pointed out that the public might want to know _who_ uploaded a particular release, and that info perhaps should be in the ownership history/audit log for a project.

Warehouse maintainers and @woodruffw discussed design and scope for this feature in a meeting today.

_Design:_ We chose a basic approach for the auditing API: a new API, called in tandem with metrics calls in places where both are needed/desired.

_Storage:_ We have some research to do into how long we will retain user-related events (#3532), but we aim to retain project-related events permanently. We discussed a more flexible storage medium than Postgres (maybe a document store?) and started thinking about what kinds of data we'll need to store.

_Scope_: To avoid running over our funds for this project, we're limiting the scope for what we'll accomplish right now. We'll focus on:

  • implementing the necessary API calls internally
  • implementing storage
  • implementing a handful of auditable events
  • creating a few views for those, and exposing what makes sense today (e.g.: recent logins, project change events, and 2FA events)

    • we may want to show most project-related events to all users, but hide some info about each event from users who aren't maintainers or owners, e.g. IP. @nlhkabu is taking note of this for designing templates.

Right now we want to ensure that we can store and retrieve events. We can add more auditable events down the line.

_Timing:_ This is the first thing we're working on in August, and we are aiming to get it merged and deployed in August.

Began work on this in #6339.

Was this page helpful?
0 / 5 - 0 ratings