Elasticsearch version:
2.4 / HEAD
Report
It looks like ES tries to perform several sanity checks before installing a seccomp filter. One of those involves invoking a random syscall:
linux_syscall(999)
There is no guarantee that "999" will stay unimplemented in future kernels, or that it won't have a particular meaning just on a specific architecture at any point in time. Moreover, this doesn't play well if there are other supervisors in the environment, which may block dangerous/unknown syscalls. Finally, I don't think it is an expected behavior for a search engine to directly call a random syscall on the basis that it is implemented.
Please just don't do this, or find some alternative nicer ways of probing, or at least provide some knobs to disable it.
/cc @rmuir who introduced it in 6a8c4a0bb752dfac460d779270bc9045da3fde36
Moreover, this doesn't play well if there are other supervisors in the environment, which may block dangerous/unknown syscalls.
Do you have specific problem in mind that can you can elaborate on?
Do you have specific problem in mind that can you can elaborate on?
- This falls outside of Docker default seccomp filter and causes troubles: https://github.com/docker-library/elasticsearch/issues/98
- there is no way for systemd SystemCallFlter to build a sane filter for this
- syscall numbers are not portable across architectures, and to build a filter for this you need to hardcode the syscall number, which goes against libseccomp recommendations
This falls outside of Docker default seccomp filter and causes troubles: https://github.com/docker-library/elasticsearch/issues/98
This has been addressed: #19754
there is no way for systemd SystemCallFlter to build a sane filter for this
What filters are you trying to apply?
syscall numbers are not portable across architectures, and to build a filter for this you need to hardcode the syscall number, which goes against libseccomp recommendations
We only support seccomp on i386 and amd64, we do not attempt to install seccomp off these two architectures.
No feedback, closing.
I was hoping that please do not invoke random/unimplemented syscalls didn't need any further feedback and explanations, but it looks like I was wrong.
Also, I'm not too happy that you started dragging this further down to "please bring more cases and feedback", given the multiple rationales I provided beforehand.
Anyway, as it looks like you explicitly require more details as to why this ES behavior is a bad idea, here I'm providing more.
This has been addressed: #19754
This is merely a workaround for a very specific case (which is why I didn't want to discuss overly specific cases, because we can keep piling up workarounds forever). Docker seccomp profiles (and any other seccomp-enabled supervisor, actually) can return whatever errno they want for blocked syscalls. By default invoking a blocked syscalls will trigger a SIGSYS signal, but that's just _one of the possible behaviors_ (so please no, don't workaround this cornercase too). If you need another usecase, see https://github.com/coreos/rkt/issues/3121 (but again, I didn't bring this in exactly to avoid biased workarounds).
What filters are you trying to apply?
Any whitelisting filter. Given that ES is invoking a syscall that is not part of the kernel ABI contract, there is no portable label to identify and whitelist it.
I was hoping that
please do not invoke random/unimplemented syscallsdidn't need any further feedback and explanations, but it looks like I was wrong.
Of course it requires explanation, and incredulity is not warranted here. We have valid reasons for doing what we do, you have valid reasons for suggesting that we think about it differently, and the only way to find common ground is to have a dialog.
Also, I'm not too happy that you started dragging this further down to "please bring more cases and feedback", given the multiple rationales I provided beforehand.
We are not going to get far at all if you can't see my questions as seeking to further understand what you're doing and thinking so we can reconsider what we do. The pejoratives are misguided.
For the record, since you opened this issue I've been leaning very heavily towards _completely_ removing this paranoid check, but I prefer to proceed carefully, hence the questions.
Thank you for coming back and addressing my questions, I'll reopen this issue and you can expect movement on it soon.
This issue stops elasticsearch from being packaged as a confined snap (snapcraft.io) for similar reasons to docker. The confinement policy causes the call to fail with permission errors. I imagine this is correct, as this sort of probing is the sort of thing confinement should be blocking, rather than leaking details about the kernel that are supposed to be walled off.
Most helpful comment
Of course it requires explanation, and incredulity is not warranted here. We have valid reasons for doing what we do, you have valid reasons for suggesting that we think about it differently, and the only way to find common ground is to have a dialog.
We are not going to get far at all if you can't see my questions as seeking to further understand what you're doing and thinking so we can reconsider what we do. The pejoratives are misguided.
For the record, since you opened this issue I've been leaning very heavily towards _completely_ removing this paranoid check, but I prefer to proceed carefully, hence the questions.
Thank you for coming back and addressing my questions, I'll reopen this issue and you can expect movement on it soon.