Vyper: ABI decoding byte array offsets allows overflows

Created on 6 Sep 2019  ·  14Comments  ·  Source: vyperlang/vyper

Version Information

  • vyper Version (output of vyper --version): 0.1.0-beta.12

What's your issue about?

The ABI uses calldata offsets to point to the location of dynamically-sized data. Vyper does not check for overflow when evaluating these offsets. For example:

@public
@constant
def temp(arr: bytes[1024]) -> bytes[1024]:
    return arr

The above contract does not correctly decode calldata provided by this Solidity contract:

pragma solidity ^0.5.0;

contract Test {

    bytes constant DATA = bytes("Hello, world!");

    function test(address _vyContract) public view returns (string memory) {
        bytes memory payload 
            = abi.encodeWithSignature("temp(bytes)", DATA);

        assembly {
            let neg := sub(0, 4)
            mstore(add(0x24, payload), neg) // modify offset of bytes array
            let res := staticcall(gas, _vyContract, add(0x20, payload), mload(payload), 0, 0)
            returndatacopy(0, 0, returndatasize)
            return(0, returndatasize)
        }
    }
}

Specifically, this line in the LLL contains the overflow issue: [calldatacopy, 320, [add, 4, [calldataload, 4]], 1056]. This is located here in the code.

How can it be fixed?

Check offset calculations for overflow, or make sure offsets are pointing within the bounds of CALLDATASIZE

All 14 comments

Thanks for the report! I am having trouble why this is a vyper issue. What is the expected behavior when the caller provides malformed data?

The data isn't really malformed. It's valid, ABI-encoded data that points outside of the range of calldata. Vyper should detect this and revert.

I guess I still don't understand. Doesn't the spec say that the offset word points to tail(X(i))? https://solidity.readthedocs.io/en/v0.5.3/abi-spec.html#formal-specification-of-the-encoding. I always get tripped up reading this spec so I am happy to be corrected if my understanding is incorrect.

Sorry, I don't think my initial reply was particularly clear. The example I gave shows that Vyper accepts an overflow during ABI decoding which leads to it reading the first 32 bytes of calldata and treating it as an array size. In every variation I tried, this led to the call failing because CALLDATACOPY inevitably reads from a massive index in calldata. While I'm not convinced this is exploitable behavior, it seems much safer to check for the overflow and explicitly revert the transaction, rather than allow the behavior and fail because of the gas limit.

My understanding of the spec is that the offset word must point to where the data is. If it doesn't, it is ipso facto invalid input. Unless I am misunderstanding the spec.

That is certainly true, but my point is that (because of the overflow) Vyper is not reading from where the offset points. If there were data there (nevermind the cost of accessing it), Vyper would not read it correctly.

Yes but the same would hold true if the offset pointed to any other location. I suppose that Vyper _could_ validate that all the offsets are valid (i.e., start at dynamic_section_start, and check offset_n is equal to offset_n-1 + len_n-1 for all offsets), but this seems like unnecessary gas to spend on something where there is no attack vector (and the entire point of ABIv2 is to have O(1) access to data instead of O(n) access). _If_ we can construct an attack vector (and I am beginning to suspect that there are indeed vectors here) then we should definitely consider doing this validation, otherwise it seems the worst case is an inconvenience to callers who provide invalid input getting tx failure instead of revert.

What attack vectors are you beginning to suspect? I'm very curious!

Regardless of the existence of a tangible attack vector - this is the type of edge-case in behavior that can exacerbate other issues. Cleaning it up allows you to make much stronger statements about the behavior of the abi decoder, which is why I recommend it!

What attack vectors are you beginning to suspect? I'm very curious!

I don't have it fully fleshed out but an attacker might be able to take advantage of the offset pointing somewhere unexpected. So for example, let's say you have a contract which checks some bytearrays against some value. For instance,

def check_withdraw(sig1: bytes[100], sig2: bytes[100]) : 
  if sig1 == sig2 : 
    send(msg.sender, balance)

Using the technique in this issue, an attacker could set the offset of sig2 to be the offset of sig1, and then the check always evaluates to true.

Yes, that's actually allowed in Solidity - and you don't need to do any overflow weirdness. Just set them to the same offset. By the ABI specification, those are the same array :)

Well then maybe there are some solidity contracts which are susceptible to this attack. I don't think that calldata which has the offset of sig2 set to the offset of sig1 is actually valid ABI-encoded data. Whether contracts check for this is a different matter. From https://solidity.readthedocs.io/en/v0.5.3/abi-spec.html#design-criteria-for-the-encoding:

The data of a variable or array element is not interleaved with other data and it is relocatable, i.e. it only uses relative “addresses”.

After thinking about it some more, I don't think this would result in a serious attack vector (any more than a caller switching two variables).

@charles-cooper what's the outcome of this issue?

@fubuloubu not sure, let's bring it up at another meeting

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ben-kaufman picture ben-kaufman  ·  4Comments

robinsierra picture robinsierra  ·  3Comments

fubuloubu picture fubuloubu  ·  3Comments

travs picture travs  ·  3Comments

pipermerriam picture pipermerriam  ·  3Comments