vyper --version): 0.1.0b12+commit.8663ac5Off-by-one error in zero_pad():
Logically, the loop termination condition should use ge instead of gt.
It can be fixed by replace gt by ge in the above, but I'm not sure if such a fix does not affect others.
@daejunpark could you please take a look at https://github.com/ethereum/vyper/pull/1605/commits/5a845710f999e07cea4ea66922b7dbd7adee4586 and let me know if you think it is a reasonable solution?
@charles-cooper thanks for this. It is a really clever trick, and I like it (from my hacker instinct)!
But I have a concern about this kind of off-label uses of opcodes for the long term in general. For example, if people discuss the behavior change of calldatacopy in a future hard fork, they may be not aware of this off-label use, and fail to consider/analyze the effect of their change to this.
I have a question. Is the zero padding size always less than 32? (I cannot think of a counterexample.)
If so, you may want to consider to use the ID precompiled for the zero padding. That is,
CALL <gas> 4 0 <src> <len> <dest> <len>
where <src> points to the zero bytes that can be preallocated in the first region of the memory.
Although this will consume more gas than the calldatacopy approach, but it is an expected use case, so would be safer for the long term.
If so, you may want to consider to use the ID precompiled for the zero padding. That is,
CALL <gas> 4 0 <src> <len> <dest> <len>
Yes, I've been thinking about this approach for awhile. In this case the zero-pad is always <32 bytes, but having this technique for zeroing arbitrary amounts of memory would be handy. So we could always guarantee that the source is a zeroed block of memory simply by reading from past-the-end of where we have allocated, or some number so large it would be impractical to write to (e.g. 2**32). But I think reasoning about where the past-the-end pointer should be for CALLDATACOPY is easier - it is always CALLDATASIZE!
But I have a concern about this kind of off-label uses of opcodes for the long term in general. For example, if people discuss the behavior change of calldatacopy in a future hard fork, they may be not aware of this off-label use, and fail to consider/analyze the effect of their change to this.
I was also concerned about this. However, I feel like the existing semantics were carefully considered and were written that way for a reason. Note the difference in semantics between CALLDATACOPY and CODECOPY - CODECOPY issues a STOP if you try to read from past-the-end. So even if it may not be the "intended" use-case for CALLDATACOPY, it is not exactly "off-label" either! A change would be quite a significant - almost pathological - change to the EVM and would definitely break the semantics of existing programs. But in any case, I think what we can do is create a new pseudo-opcode mzero <dst> <len> which zeroes out the destination block (analogously to stdlib's bzero) and then choose to implement it however we want. Then even if CALLDATACOPY is changed for whatever reason, the overhead of swapping out the implementation of mzero is fairly low.
@charles-cooper I agree with all of your points, but please note that the overhead of updating the mzero implementation may not be low from the contract owners' perspective, because they will need to redeploy the updated bytecode if that happens, and the redeployment may not be easy in certain situations.
@CarlBeek what do you think about this in terms of the deposit contract operation?
FYI, this is also related: https://github.com/ethereum/vyper/issues/1610
@charles-cooper聽I agree with all of your points, but please note that the overhead of updating the聽mzero聽implementation may not be low from the contract owners' perspective, because they will need to redeploy the updated bytecode if that happens, and the redeployment may not be easy in certain situations.
Thanks for the note. My assumption was that, if the opcode semantics are ever changed, it would be as an EVM version update so that only contracts deployed after fork_blocknum would have the new semantics, while existing contracts would retain the old semantics.
I want to do the quick fix for the gt issue though since it seems more pressing. As it stands, the loop can write a \0 byte to one past the end of the bytearray.
Thanks for the quick fix! It will help to facilitate the deposit contract verification process.
BTW, my understanding is that even a contract deployed before certain fork will be ran using the new semantics of the folk. (Or, I may misunderstood your explanation.)
@daejunpark there are several proposals to add versioning to the EVM https://github.com/ethereum-cat-herders/PM/issues/53. I don't believe (and this is just my educated guess) such a significant semantic change to CALLDATALOAD would make it in without some kind of versioning scheme.
Initially it seemed to me that this was too off-label to be a part of compiler behaviour, but after taking a look at the yellow paper, Geth, and py-evm implementations, I agree with @charles-cooper that this would be something that would need to be specified via EVM versioning.
On the topic of longevity and redeployment, the deposit contract is intended to live until at least the death of Eth1 somewhere way in the future and thus it is vital that this be resolved in a robust manner. As mentioned above, however, I don't think that this will present a problem.
Meeting: Merge #1611, then #1605 later
Thought I'd chime in and just say that, in my mind, the likelihood that CALLDATALOAD will stop yielding zeros for ranges outside of CALLDATASIZE any time soon is basically zero. I'd feel pretty comfortable adding compiler features that depend on that behavior.
Note from meeting: relying on CALLDATALOAD assumption is okay as long as it gets documented well.
Just to make sure, I do not have an objection to the calldataload approach, if it is very unlikely that the current semantics of calldataload will change in the future.