Sometimes a shared filesystem is too much work to set up. The most convenient way to transfer files securely between remote nodes is usually SCP.
A snapshot repository type that used SSH to transfer files to a remote host would be a great addition to Elasticsearch, and would make snapshots more useful out of the box.
Configuration could be something like this:
{
"type": "ssh",
"settings": {
"location": "[email protected]:/backups/elasticsearch",
"ssh_key": "/home/elasticsearch/.ssh/id_rsa"
}
}
What do you think @imotov?
I'm glad someone else needs this feature as well ! :+1:
I created SSH repository for elasticsearch 1.4:
https://github.com/marevol/elasticsearch/commit/bf0348c732d3d35e2634c8db3f42c2112d76e4da
If we have a chance to merge it, I'll send PR.
The configuration to create ssh repository is below.
curl -XPUT 'http://localhost:9200/_snapshot/backup_ssh' -d '{
"type": "ssh",
"settings": {
"location": "/somewhere/snapshot_dir",
"host": "123.123.123.123",
"port": 22,
"username": "taro",
"password": "xxxxxxxx",
"compress": true
}
}'
I like the idea. Just wondering if this should come as a built-in feature or as a plugin.
@imotov WDYT?
Isn't it easy enough to mount a remote filesystem using sshfs?
@jpountz It could be enough. But I think it requires more work (administrative task) as you need to mount this on every single node. With SSH, you just have to use it! :)
But then that work is done in the right place, by the user.
Otherwise, elasticsearch has to interact with ssh directly like here:
This is too scary IMO.
I agree we should not deal with credentials here. I think we should rather make it super simple to configure this and / or make it simpler to build your own plugin if you really wanna do that as a build in option.
@jpountz sshfs requires fuse support and is unreliable.
A more general solution is a command executor that spawns one or more processes with arguments similar to scp. The command returns success if the files are successfully handled.
{
"type": "process",
"settings": {
"command": "/usr/local/bin/scp_to_remote"
}
}
There are a lot of details to work out (safely spawning processes, timeouts, retries, argument formatting) but if it worked it could be useful for integrating with existing backup solutions.
There are a lot of details to work out (safely spawning processes, timeouts, retries, argument formatting) but if it worked it could be useful for integrating with existing backup solutions.
I am not a huge fan of external proceses. This is so painful in java I don't think this will be an option here to be honest.
Thank you for your comment/feedback.
I think that it's better to avoid any security concerns in elasticsearch.
I'll modify/provide it as one of plugins in https://github.com/codelibs
Ok. So we can close this thread. When you're done, feel free to update the plugins page. Thanks!
Most helpful comment
I'm glad someone else needs this feature as well ! :+1: