Psalm: Taint Analysis with Psalm

Created on 21 Mar 2018  路  5Comments  路  Source: vimeo/psalm

https://en.wikipedia.org/wiki/Taint_checking

A lot of PHP security scanners will correctly identify $_GET['foo'] as user input that shouldn't be used directly without validation. However, they will fail to understand that ServerRequest::fromGlobals()->getQueryParams()['foo'] is just as bad.

Implementing taint checking isn't a trivial undertaking, but it would be a huge value add to projects like Psalm.

A very straightforward implementation will require adding custom annotations that mark data as tainted or safe, and/or tell Psalm "whatever it returns should be marked as tainted".

For example:

  • @psalm-tainted-var $foo marks a variable as tainted
  • @psalm-untainted-var $foo marks a variable as safe
  • @psalm-tainted-return tells Psalm this function/method returns untrusted data
  • @psalm-untainted-return tells Psalm this function/method returns trustworthy data

We'd also need to be able to mark function parameters (including for builtin functions) as risky for tainted data to land.

This may sound simple, but considering that "safe against SQLi" doesn't imply "safe against XSS" or "safe against directory traversal", it's going to require a bit of creativity to make it usable.

Most helpful comment

@paragonie-scott wait what's wrong with doing

$pdo->query('select * from users where id = ' . $_GET['id'])

Buy VIAGARA CHEAP

All 5 comments

@paragonie-scott wait what's wrong with doing

$pdo->query('select * from users where id = ' . $_GET['id'])

Buy VIAGARA CHEAP

Anyway, the issue I see is that Psalm can't verify that a method actually does the job of making a given input safe. And, more importantly, Psalm's analysis doesn't follow a particular input through all possible paths (because it would be inordinately slow).

So any analysis would involve two steps:

  • creating an algebraic representation of each input in every function, like Flow does, where its value at each step is represented as a product of its previous values e.g.
    $c_ref0->tainted = $b_ref0->tainted | $a_ref0->tainted
    $c_ref1->tainted = $c_ref1->tainted
  • join everything together and look for tainted paths into critical functions (e.g. PDO::query, echo)

That would be a fairly big undertaking

Yeah, I suspected as much. I might end up having to build this as its own tool (using PHP-Parser instead of the AST extension).

Facebook just published this: https://cacm.acm.org/magazines/2019/8/238344-scaling-static-analyses-at-facebook/fulltext

And I think it bears thinking about a bit more.

Additional wrinkle:

Given the code

class Utils {
  /**
   * @psalm-pure
   */
  public static function shorten(string $str) : string {
    if (strlen($str) > 30) {
      return substr($str, 0, 30) . "...";
    }
    return $str;
  }
}

class A {
  public function foo() : void {
    echo(Utils::shorten((string) $_GET["user_id"]));
  }

  public function bar() : void {
    echo(Utils::shorten("hello"));
  }
}

The path in A::foo should raise an issue, but the path in A::bar should not. Psalm should take the purity of Utils::shorten into consideration, insofar as it does not rely on state.

Edit: above code now recognised: https://psalm.dev/r/adceef6b3a

Was this page helpful?
0 / 5 - 0 ratings