Sidekiq: Encrypted job arguments

Created on 14 Jul 2016  路  22Comments  路  Source: mperham/sidekiq

For companies that deal with sensitive financial or health data, it's possible that job arguments trigger the need for encryption. Add seamless argument encryption as a new feature to Sidekiq Enterprise.

Proposed API

# in the initializer
Sidekiq::Enterprise::Crypto.enable(active_version: 1) do |version|
  # return symmetric key for version X, e.g.
  Base64.decode(ENV["SECRETKEY#{version}"])
end
class SecretWorker
  include Sidekiq::Worker
  sidekiq_options encrypt: true

Details

  1. Exception messages and backtraces will not be encrypted. If an error is thrown by a job, the message may contain part of the data but it's beyond Sidekiq's scope to figure that out.
  2. The arguments will be encrypted in the Web UI too, making it difficult to debug.
  3. The key can be changed by bumping the active_version and having the block return the key for that new version. Once old v1 jobs have drained, you can remove that key.
  4. Right now I'm using the aes-256-cbc cipher + Base64 encoding which adds about (100 bytes + 30%) of overhead to each job. With arguments that take 1000 bytes unencrypted, you can expect it to take about 1400 bytes encrypted.
  5. There is an extension point where you can add your own custom crypto scheme using encrypt: "something" as the Worker option but it's not documented for now.
enterprise

Most helpful comment

Rot13 was deprecated in favor of rot26.

All 22 comments

Mike

We were building this as an extension. If this is a part of enterprise that is even better.
My suggestion is for users to be able to pass their own encrypt/decrypt methods instead of only sending secret/public keys. I would also add a method for versioning encryption. If jobs were encrypted with old version they would be moved to morgue)

I don't see issues with raised Exceptions not being encrypted unless they are shown on the Web UI. In that case all parameters should be encrypted(Retries, Failures, Scheduled tabs)

Do you need any more input for us?

Thanks

While the user should be able to customize the crypto, I still need to supply a decent default impl so my questions still stand. I don't want to provide ROT13.

Rot13 was deprecated in favor of rot26.

I would use blowfish as the default implementation(not reliable but easy to use)

Something in the lines of

    cipher = OpenSSL::Cipher::Cipher.new('bf-cbc').encrypt
    cipher.key = Digest::SHA256.digest(key)
    cipher.update(data_to_be_encrypted) << cipher.final

@danielnc that isn't an authenticated encryption mode, meaning an attacker could potentially tamper with the ciphertext. Blowfish has generally fallen out of favor as it lacks the hardware acceleration of AES and small block size (64-bits, vs 128-bits for AES).

I would suggest using RbNaCl with crypto_secretbox (XSalsa20+Poly1305) and the SimpleBox wrapper for generating nonces, but I might be biased because I wrote it:

https://github.com/cryptosphere/rbnacl

If you're not using that, consider using AES-GCM with the Ruby OpenSSL extension.

Regarding key rotation, you can look at something like this, although it uses nonstandard message formats and I'm not exactly maintaining it:

https://github.com/cryptosphere/cryptor

@tarcieri I'd like my default impl to use stdlib only, as much as I respect rbnacl and the NaCl projects. Do you know of an example of how to use AES-GCM with OpenSSL? This stuff is so easy to get wrong.

https://github.com/attr-encrypted/attr_encrypted uses AES-256-GCM by default. I haven't looked at the crypto code itself but it's a widely used project so I would _assume_ it would be a good place to start.

Updated description with current implementation.

@mperham Why did you go with CBC over GCM?

Edit: GCM includes authentication so you don't have to compute a separate MAC.

  1. The arguments will be encrypted in the Web UI too, making it difficult to debug.

To address the debuggability we've been using a pattern of a cleartext identifier as the first argument and an encrypted payload object as the second argument. Could you allow that kind of usage somehow?

@ryansch OpenSSL didn't add GCM until 1.0.1 and I'm afraid we'll see compatibility issues with Rubies running on older Linux distros, like 12.04. CBC seems like a reasonable tradeoff for maximum compatibility but please speak up if you think differently.

@mikegee I'd actually thought of the same pattern but wasn't sure it was a good idea. If you folks use and like it, that's a strong push for me to implement it. Any ideas to make the pattern as clear and foolproof as possible? What if people just want to throw a Hash of data as a single arg and expect to see it all encrypted? Should it blow up if the Worker takes less than two args?

I think we need to figure out what our threat cases look like. If we want to protect against an attacker making changes to the ciphertext then we need a MAC. If we go with CBC we'll need to compute our own HMAC on the ciphertext.

Ping @tarcieri on previous comment for accuracy check.

Associated wiki page with support for @mikegee's argument pattern https://github.com/mperham/sidekiq/wiki/Ent-Encryption

Encrypting the last argument seems legit to me. @meadoch1 built that part of our system. Hopefully he can contribute an opinion.

We have definitely found it valuable to have some clear text parameters along with an encrypted payload. If nothing else it makes understanding what messages are being displayed in the WebUI possible. We have fallen into the pattern @mikegee mentioned above of just encrypting the last argument, but for the general framework maybe a scheme of whitelisting parameters to leave in cleartext as a worker option would make sense. That way people could choose how they want to do it without much effort.

For the size bloat, we have taken to adding a compression step too. Most of our gains came from the fact that the data getting encrypted frequently contained a JSON blob that compressed well anyway. However, the side benefit is that the encoded message is significantly smaller even with the encryption and Base64 encoding. We also haven't found that the overhead of compression/decompression has had any significant impact.

At a previous place we also went to the extent of using public/private key pairs to do the encryption. That was significantly more complex and introduced the need to have a more sophisticated key management scheme. The benefit was that we could encrypt the contents in such a way that if the website was hacked then the messages were still secure since the private keys were not there. This may be overkill for most uses, but might be something to keep in mind when thinking about custom encryption scenarios.

Overall though, kudos for thinking about this as a core feature. I think it will benefit a lot of people.

One more thought, are the hooks in place to allow Sidekiq to not do the Base64 encoding of the encrypted field(s)? Redis could take the data as pure binary without issue, but you'd have to have a way around Sidekiq's default behavior to handle it as text when serializing. It's been a while since I looked that deeply into the Sidekiq code, but if that worked then it would solve the biggest pain in the bloat problem. (that being the growth that is a % of original size)

Worst case I guess the encrypted blob could be stored in an additional Redis key/value slot outside of the currently serialized slot. That "special" slot could be handled as pure binary data. There may be other downsides to this approach, but I'm mainly throwing out ideas to see if a not great idea can cause someone to come up with a good one.

@ryansch that's true, but there are a lot of potential problems still: you need to use an encrypt-then-MAC construction, make sure you're comparing the MAC in constant time, etc.

For these reasons an AEAD mode is a much safer choice.

We have definitely found it valuable to have some clear text parameters along with an encrypted payload.

@meadoch1 Perhaps you could leverage the authenticated data portion of an AEAD mode like AES-GCM

That seems like it could work. The main benefit we've had is that in the WebUI we can easily read the non-sensitive first parameters to understand enough context about the message to make support decisions. So long as the WebUI takes whatever scheme into account to provide similar information then it would probably suit the need.

I can't massively complicate how arguments are serialized and displayed for a feature that 1% of Sidekiq users will use. Please remember that there's always a trade off between performance, compatibility, complexity, usability, etc.

@meadoch1 I'm not planning on adding compression at this point. I've always pushed job arguments to be small but there's nothing stopping someone from compressing before-hand.

Typically pubkey encryption is used to derive/protect a symmetric key. That's why the enable block is there - you can perform any logic necessary to access the symmetric key. I've worked with one customer where production used a hardware "network safe" device for keys: you had to request a key over the network via TLS and could not persist it to disk, ENV, etc. That key was versioned and changed regularly.

Happy to say that encryption has been upgraded to AES-GCM in next release.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bartimaeus picture bartimaeus  路  3Comments

BeRMaNyA picture BeRMaNyA  路  3Comments

rajcybage picture rajcybage  路  3Comments

mperham picture mperham  路  3Comments

andrewhavens picture andrewhavens  路  4Comments