Vyper: ABI decoding byte array offsets allows overflows

Created on 6 Sep 2019 · 14Comments · Source: vyperlang/vyper

Version Information

vyper Version (output of vyper --version): 0.1.0-beta.12

What's your issue about?

The ABI uses calldata offsets to point to the location of dynamically-sized data. Vyper does not check for overflow when evaluating these offsets. For example:

@public
@constant
def temp(arr: bytes[1024]) -> bytes[1024]:
    return arr

The above contract does not correctly decode calldata provided by this Solidity contract:

pragma solidity ^0.5.0;

contract Test {

    bytes constant DATA = bytes("Hello, world!");

    function test(address _vyContract) public view returns (string memory) {
        bytes memory payload 
            = abi.encodeWithSignature("temp(bytes)", DATA);

        assembly {
            let neg := sub(0, 4)
            mstore(add(0x24, payload), neg) // modify offset of bytes array
            let res := staticcall(gas, _vyContract, add(0x20, payload), mload(payload), 0, 0)
            returndatacopy(0, 0, returndatasize)
            return(0, returndatasize)
        }
    }
}

Specifically, this line in the LLL contains the overflow issue: [calldatacopy, 320, [add, 4, [calldataload, 4]], 1056]. This is located here in the code.

How can it be fixed?

Check offset calculations for overflow, or make sure offsets are pointing within the bounds of CALLDATASIZE

Source

wadeAlexC

All 14 comments

Thanks for the report! I am having trouble why this is a vyper issue. What is the expected behavior when the caller provides malformed data?

charles-cooper on 6 Sep 2019

The data isn't really malformed. It's valid, ABI-encoded data that points outside of the range of calldata. Vyper should detect this and revert.

wadeAlexC on 6 Sep 2019

I guess I still don't understand. Doesn't the spec say that the offset word points to tail(X(i))? https://solidity.readthedocs.io/en/v0.5.3/abi-spec.html#formal-specification-of-the-encoding. I always get tripped up reading this spec so I am happy to be corrected if my understanding is incorrect.

charles-cooper on 6 Sep 2019

Sorry, I don't think my initial reply was particularly clear. The example I gave shows that Vyper accepts an overflow during ABI decoding which leads to it reading the first 32 bytes of calldata and treating it as an array size. In every variation I tried, this led to the call failing because CALLDATACOPY inevitably reads from a massive index in calldata. While I'm not convinced this is exploitable behavior, it seems much safer to check for the overflow and explicitly revert the transaction, rather than allow the behavior and fail because of the gas limit.

wadeAlexC on 8 Sep 2019

👍1

My understanding of the spec is that the offset word must point to where the data is. If it doesn't, it is ipso facto invalid input. Unless I am misunderstanding the spec.

charles-cooper on 8 Sep 2019

That is certainly true, but my point is that (because of the overflow) Vyper is not reading from where the offset points. If there were data there (nevermind the cost of accessing it), Vyper would not read it correctly.

wadeAlexC on 8 Sep 2019

Yes but the same would hold true if the offset pointed to any other location. I suppose that Vyper _could_ validate that all the offsets are valid (i.e., start at dynamic_section_start, and check offset_n is equal to offset_n-1 + len_n-1 for all offsets), but this seems like unnecessary gas to spend on something where there is no attack vector (and the entire point of ABIv2 is to have O(1) access to data instead of O(n) access). _If_ we can construct an attack vector (and I am beginning to suspect that there are indeed vectors here) then we should definitely consider doing this validation, otherwise it seems the worst case is an inconvenience to callers who provide invalid input getting tx failure instead of revert.

charles-cooper on 8 Sep 2019

👍1

What attack vectors are you beginning to suspect? I'm very curious!

Regardless of the existence of a tangible attack vector - this is the type of edge-case in behavior that can exacerbate other issues. Cleaning it up allows you to make much stronger statements about the behavior of the abi decoder, which is why I recommend it!

wadeAlexC on 8 Sep 2019

What attack vectors are you beginning to suspect? I'm very curious!

I don't have it fully fleshed out but an attacker might be able to take advantage of the offset pointing somewhere unexpected. So for example, let's say you have a contract which checks some bytearrays against some value. For instance,

def check_withdraw(sig1: bytes[100], sig2: bytes[100]) : 
  if sig1 == sig2 : 
    send(msg.sender, balance)

Using the technique in this issue, an attacker could set the offset of sig2 to be the offset of sig1, and then the check always evaluates to true.

charles-cooper on 8 Sep 2019

Yes, that's actually allowed in Solidity - and you don't need to do any overflow weirdness. Just set them to the same offset. By the ABI specification, those are the same array :)

wadeAlexC on 8 Sep 2019

Well then maybe there are some solidity contracts which are susceptible to this attack. I don't think that calldata which has the offset of sig2 set to the offset of sig1 is actually valid ABI-encoded data. Whether contracts check for this is a different matter. From https://solidity.readthedocs.io/en/v0.5.3/abi-spec.html#design-criteria-for-the-encoding: