Etcd: rafthttp decoder: the buffer length should be checked before allocation

Created on 18 May 2016  路  8Comments  路  Source: etcd-io/etcd

https://github.com/coreos/etcd/blob/bd71a608753c96e92398327b808ff51191e0c32d/rafthttp/msg_codec.go#L50

func (dec *messageDecoder) decode() (raftpb.Message, error) {
    var m raftpb.Message
    var l uint64
    if err := binary.Read(dec.r, binary.BigEndian, &l); err != nil {
        return m, err
    }
    buf := make([]byte, int(l))
    if _, err := io.ReadFull(dec.r, buf); err != nil {
        return m, err
    }
    return m, m.Unmarshal(buf)
}

The value of l can be bogus due to an attack or packet corruption.
So I suggest checking the value of l before make([]byte, int(l)).

areperformance

Most helpful comment

@xiang90 Does etcd server also include the whole marshaled v2 store in the raft message that it send here:
https://github.com/coreos/etcd/blob/master/etcdserver/snapshot_merge.go#L52?
If thats the case, then is the assumption that raft http's decoder always deals with messages of size 1MB correct? We are seeing this error on our etcd cluster where the leader is unable to send raft snapshots to one of the followers:

Leader host:
rafthttp: database snapshot [index: <INDEX>, to:<INDEX>] failed to be sent out (unexpected http status Bad Request while posting to "https://<HOST>:<PORT>/raft/snapshot")
Follower host:
rafthttp: failed to decode raft message (rafthttp: error limit exceeded)

All 8 comments

@AkihiroSuda We do not plan to avoid attack inside etcd itself. But it is a nice thing to have to avoid potential corruption to panic etcd server. Can you send a patch to fix this?

What should be maximum valid message size?

@nekto0n Now each raft entry is limited to 1MB, however each message might contain more than a few entries when under high load. I feel we can start with an aggressive value like 512MB, and shrink it down eventually.

@nekto0n Do you want to help on this one?

@xiang90 Does etcd server also include the whole marshaled v2 store in the raft message that it send here:
https://github.com/coreos/etcd/blob/master/etcdserver/snapshot_merge.go#L52?
If thats the case, then is the assumption that raft http's decoder always deals with messages of size 1MB correct? We are seeing this error on our etcd cluster where the leader is unable to send raft snapshots to one of the followers:

Leader host:
rafthttp: database snapshot [index: <INDEX>, to:<INDEX>] failed to be sent out (unexpected http status Bad Request while posting to "https://<HOST>:<PORT>/raft/snapshot")
Follower host:
rafthttp: failed to decode raft message (rafthttp: error limit exceeded)

@gyuho @xiang90: we see rafthttp: error limit exceeded when we bring an existing 2.3 cluster up with 3.1. I think it's safe to say that the raft message is valid. etcd is holding on the order of 10GB in memory on 2.3.

  • is it possible to configure etcd to send a smaller raft message?
  • do you think it is reasonable to increase or remove this limit?
  • are there any other workarounds we can use to bring up the cluster on 3.1?

See also: @yvdinesh's comment above

@yvdinesh @jonsyu1 Sorry for late response. We missed these comments somehow. We've just have released v3.1.9 and 3.2.0 with the fix https://github.com/coreos/etcd/pull/8074. Please create another issue if you still have problems.

OK thanks @gyuho! We wound up working around it by ensuring that all nodes come up with 3.1 capabilities in a short window.

Was this page helpful?
0 / 5 - 0 ratings