Gocryptfs: Feature Request: encryption primitives for devices without AES cpu instructions

Created on 6 Feb 2020  Â·  38Comments  Â·  Source: rfjakob/gocryptfs

Hi @rfjakob,

Thank you for this great application! The reverse mode is what really sets it apart from other options.

I checked the issues, and it doesn't seem to be discussed yet, but what do you think about adding support for a different collection of encryption primitives that are better suited for more low-end devices?

I'm running gocryptfs on a few ARMv6/7 based NAS machines, they are nice: low energy, and quite fast. But they lack native AES instructions, my fastest ARM device (Odroid XU4) maxes out at 40MB/s, while for example the raspberry-pi's and friends are quite a bit slower (rpi1 is at 15MB/s).

Maybe Google Adiantum (also added to linux kernel 5.0 for cryptfs) is a nice fit, Adiantum is based on XChaCha12 and Poly1305 and is roughly 5 quicker than AES-XTS for devices without AES instructions.

For the reverse mode maybe something based on ChaCha20Poly1305?

Just for comparison, on my Odroid XU4, ChaCha20Poly1305 runs at 320MB/s, on my RPi1 it gets close to 40MB/s.

So I'm just wondering what your view is on this topic.

Cheers,
Davy

feature request maybe some day ⌛

Most helpful comment

my (old) intel pc could really speed up with XChaCha20-Poly1305-Go:

model name  : Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority dtherm

$gocryptfs -speed
gocryptfs v1.8.0-35-g274e0d2; go-fuse v2.0.3; 2020-06-01 go1.14.3 linux/amd64
AES-GCM-256-OpenSSL       94.85 MB/s    (selected in auto mode)
AES-GCM-256-Go            40.20 MB/s    
AES-SIV-512-Go            32.05 MB/s    
XChaCha20-Poly1305-Go    328.54 MB/s    

All 38 comments

Hi, would you mind running gocryptfs -speed on your ARM machines and posting the result? (and cat /proc/cpuinfo | grep -E "model name|flags" | head -2).

I'd like to add it to our CPU zoo at ( https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks )

I've taken all different kind of ARM devices I have:

Odroid XU4 (Exynos 5422 - ARM Cortex-A15 - 2 GHz)

model name      : ARMv7 Processor rev 3 (v7l)
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
$ gocryptfs -speed
AES-GCM-256-OpenSSL       34.26 MB/s    (selected in auto mode)
AES-GCM-256-Go            17.24 MB/s
AES-SIV-512-Go            17.58 MB/s
$ openssl speed -evp chacha20-poly1305 && openssl speed -evp aes-256-gcm
...
The 'numbers' are in 1000s of bytes per second processed.
type                 16 bytes    64 bytes     256 bytes    1024 bytes   8192 bytes   16384 bytes
chacha20-poly1305    64066.72k   130153.44k   275532.80k   306572.84k   320018.56k   307903.74k
aes-256-gcm          40323.87k   49980.74k    64734.47k    70323.03k    71862.66k    71786.19k

Raspberry Pi 3 B rev 1.2 (BCM2835 - ARM Cortex-A53 - 1.2Ghz)

model name      : ARMv7 Processor rev 4 (v7l)
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
$ gocryptfs -speed
AES-GCM-256-OpenSSL       17.13 MB/s    (selected in auto mode)
AES-GCM-256-Go             5.27 MB/s
AES-SIV-512-Go             4.31 MB/s
$ openssl speed -evp chacha20-poly1305 && openssl speed -evp aes-256-gcm
...
The 'numbers' are in 1000s of bytes per second processed.
type                 16 bytes     64 bytes     256 bytes    1024 bytes   8192 bytes   16384 bytes
chacha20-poly1305    30020.39k    63560.13k    77169.32k    82019.33k    83536.55k    83645.78k
aes-256-gcm          16137.38k    19500.97k    20668.33k    20986.20k    21127.17k    21135.36k

Raspberry Pi B rev 2 (BCM2835 - ARM 11 - 700Mhz)

model name      : ARMv6-compatible processor rev 7 (v6l)
Features        : half thumb fastmult vfp edsp java tls
$ gocryptfs -speed
AES-GCM-256-OpenSSL        4.80 MB/s    (selected in auto mode)
AES-GCM-256-Go             1.85 MB/s
AES-SIV-512-Go             1.50 MB/s
$ openssl speed -evp chacha20-poly1305 && openssl speed -evp aes-256-gcm
...
The 'numbers' are in 1000s of bytes per second processed.
type                  16 bytes    64 bytes     256 bytes    1024 bytes   8192 bytes   16384 bytes
chacha20-poly1305     8090.97k    18202.65k    23222.03k    24960.34k    25666.44k    24958.29k
aes-256-gcm           4525.91k    6268.65k     6972.36k     7141.38k     7230.33k     7150.88k

I have added an XChaCha20-Poly1305 benchmark to gocryptfs -speed in the xchacha20 branch. On my PC, the results look very promising, with xchacha20 being almost as fast as hardware-accelerated AES-GCM:

$ gocryptfs -speed
AES-GCM-256-OpenSSL      585.92 MB/s    
AES-GCM-256-Go           899.28 MB/s    (selected in auto mode)
AES-SIV-512-Go           164.05 MB/s    
XChaCha20-Poly1305-Go    773.27 MB/s    

HOWEVER, looking at https://github.com/golang/crypto/tree/master/chacha20poly1305 , there only seems to an optimized assembly version for amd64 (xxx_amd64.s).

Could you run gocryptfs -speed from the xchacha20 branch on one of your ARM devices, so see how the fast Go implementation is there?

I have compiled that branch for Armv7, binary: gocryptfs.xchacha20.armv7.tar.gz

Thanks for the binary:

on the Odroid XU4:

$ ./gocryptfs.xchacha20.armv7 --speed
AES-GCM-256-OpenSSL         N/A
AES-GCM-256-Go            17.04 MB/s    (selected in auto mode)
AES-SIV-512-Go            14.79 MB/s
XChaCha20-Poly1305-Go     23.37 MB/s
$ gocryptfs --speed
AES-GCM-256-OpenSSL       41.12 MB/s    (selected in auto mode)
AES-GCM-256-Go            16.92 MB/s
AES-SIV-512-Go            19.10 MB/s

The other ARM devices I have to try later.

Pitty golang has not added asm chacha versions yet, maybe the same openssl bridge for speed?

I had the same idea, unfortunately, openssl does not have xchacha20 yet: https://github.com/openssl/openssl/issues/5523

They do have chacha20, but this cannot be used with random nonces (too high risk of collisions)

that's a shame, could you add an option to also bench chacha20 case?

Just to get a sense of the impact of non-asm version, it might be that chacha20 is faster than xchacha20?

I'm reading a bit, and the size & message restrictions on chacha20 are not that bad right?

https://pycryptodome.readthedocs.io/en/latest/src/cipher/chacha20.html
https://libsodium.gitbook.io/doc/advanced/stream_ciphers/chacha20

The table on https://pycryptodome.readthedocs.io/en/latest/src/cipher/chacha20.html is very nice!

The problem with ChaCha20: Max 200 000 messages. In gocryptfs, one "message" is a 4kiB data block, so that's a limit of 800 GiB data written over the lifetime of the filesystem!

The normal one in go (and I think also openssl) is the second row in that table.

Hi, I previously ported Gocryptfs to use wolfSSL. Does the code below allow the use of a random nonce with ChaCha20?

https://github.com/wolfSSL/wolfssl/blob/master/wolfcrypt/src/chacha.c#L111

@DavyLandman I see, 96 bit nonces, that's less bad. gocryptfs used 96 bit nonces in earlier versions. I moved to 128 bits because 96 bit it too little for very large filesystems, I have the calculations saved in https://github.com/rfjakob/gocryptfs/issues/17#issuecomment-169020984 .

And also, https://pkg.go.dev/golang.org/x/crypto/chacha20poly1305 says,

XChaCha20-Poly1305 is a ChaCha20-Poly1305 variant that takes a longer nonce, suitable to be generated randomly without risk of collisions. It should be preferred when nonce uniqueness cannot be trivially ensured, or whenever nonces are randomly generated.

so I'd rather not go with ChaCha20.

@lechner Yes it does, but only 96 bits according to the function comment

this version uses the typical AEAD 96 bit nonce

@DavyLandman I see, 96 bit nonces, that's less bad. gocryptfs used 96 bit nonces in earlier versions. I moved to 128 bits because 96 bit it too little for very large filesystems, I have the calculations saved in #17 (comment) .

I was just reading the RFC5379, and it specifically notes that a random nonce is not needed, just as long as it is unique, a simple counter is just as secure.

4. Security Considerations

The most important security consideration in implementing this
document is the uniqueness of the nonce used in ChaCha20. Counters
and LFSRs are both acceptable ways of generating unique nonces

Also discussed on Crypto SE.

Assuming 4KiB sectors, you would have to write (2^96 * 4 KiB) bytes before this counter overflows. Which is after 324.518.554 yottabytes. That should be good enough right ? ;)

Was reading SE and per chance a relevant question popped up: https://crypto.stackexchange.com/questions/77982/how-to-generate-a-nonce-for-chacha20-poly1305

Using a counter as the nonce would be nice, unfortunately, I don't think we can. There may be multiple gocryptfs processes writing to the folder at the same time (use case: encrypted folder on shared network drive).

Ah, my bad, I hope something like a cluster offset/index or inode index would suffice.
Must be too simplistic of me.

On Fri, Mar 6, 2020, 07:39 rfjakob notifications@github.com wrote:

Using a counter as the nonce would be nice, unfortunately, I don't think
we can. There may be multiple gocryptfs processes writing to the folder at
the same time (use case: encrypted folder on shared network drive).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rfjakob/gocryptfs/issues/452?email_source=notifications&email_token=AABL3E3BKJDIOXZZXZQOS5LRGCLCVA5CNFSM4KQZFI42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOAIAOY#issuecomment-595623995,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AABL3E6XALNDW342FOGWNBTRGCLCVANCNFSM4KQZFI4Q
.

I have added the gocryptfs.xchacha20.armv7 results to https://github.com/rfjakob/gocryptfs/wiki/CPU-Benchmarks .

I'm afraid using XChaCha20-Poly1305-Go does not make sense, as it is slower than AES-GCM-256-OpenSSL.

We can revisit when openssl gets XChaCha20.

Actually, on a Raspberry Pi 4 with Ubuntu 64 bit, things look differently:

$ ./gocryptfs -speed
AES-GCM-256-OpenSSL       21.50 MB/s    (selected in auto mode)
AES-GCM-256-Go            21.75 MB/s    
AES-SIV-512-Go            17.64 MB/s    
XChaCha20-Poly1305-Go    109.78 MB/s    

I just ran it on my rpi3:

$ ./gocryptfs.xchacha20.armv7 --speed
AES-GCM-256-OpenSSL         N/A
AES-GCM-256-Go             4.86 MB/s    (selected in auto mode)
AES-SIV-512-Go             4.53 MB/s
XChaCha20-Poly1305-Go      9.26 MB/s
$ gocryptfs --speed
AES-GCM-256-OpenSSL       16.83 MB/s    (selected in auto mode)
AES-GCM-256-Go             5.24 MB/s
AES-SIV-512-Go             4.20 MB/s

Needs a 64 bit gocryptfs to be fast. Go has optimized xchacha assembly for arm64.

Ah, yes, okay so it's for the zoo then ;)

With quite some work you could link/cgo these asm versions: https://github.com/floodyberry/chacha-opt/tree/master/app/extensions/chacha

Ok, don't have an arm64 machine, maybe someone else will come along :)

On Thu, Apr 9, 2020 at 10:23 PM rfjakob notifications@github.com wrote:

arm64 binary:
gocryptfs.xchacha20.arm64.tar.gz
https://github.com/rfjakob/gocryptfs/files/4458465/gocryptfs.xchacha20.arm64.tar.gz

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rfjakob/gocryptfs/issues/452#issuecomment-611735230,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AABL3E7XGIIMDVB4B5NZ2FDRLYVF3ANCNFSM4KQZFI4Q
.

It might be nice to get a armv6/v7 native version into golang. That would open up quite a range of devices.

Relevant issue: https://github.com/golang/go/issues/22809

@DavyLandman

A comparison of two versions to characterize performance on the same arm64 hardware. The device under test is pre-production hardware from a product that didn't make it to market, so the interesting thing is not its performance, but the relative perf of two versions.

Ubuntu 18.04 LTS with apt install gocryptfs

AES-GCM-256-OpenSSL      241.04 MB/s    (selected in auto mode)
AES-GCM-256-Go            38.06 MB/s
AES-SIV-512-Go            28.61 MB/s
gocryptfs 1.4.3; go-fuse 0.0~git20171124.0.14c3015; 2018-02-05 go1.9.3

Same hardware with a fresh build of gocryptfs on a fresh copy of Go

AES-GCM-256-OpenSSL      216.25 MB/s    (selected in auto mode)
AES-GCM-256-Go           450.68 MB/s
AES-SIV-512-Go           100.51 MB/s
gocryptfs v1.7.1-37-g75f1677; go-fuse v2.0.3; 2020-04-13 go1.14.2 linux/arm64

I would hope that whoever is packaging gocryptfs for Ubuntu 20.04 LTS is using a sufficiently modern version to pick up all of Go's perf improvements.

@vielmetti Hi, you can see from your output that gocryptfs in Ubuntu 18.04 LTS was built with the much older version 1.9 of golang-go. They probably improved encryption speeds when AES is not available. That shows up in 1.14.

This does not look like a packaging issue to me. Please file a bug against your gocryptfs package if you think otherwise.

I maintain gocryptfs in Debian, which is where I believe Ubuntu gets the package.

Thanks @lechner - do you happen to know which Go version will land in 20.04 LTS? Hoping that it's new enough to pick up a bunch of improvements.

@vielmetti For a definitive answer, you would have to ask the Ubuntu release team. It looks like 1.13.

You can always see what's in Debian here.

@vielmetti interesting, thanks for the numbers. Pretty fast GCM, looks like it's hardware-accelerated.

Could you "git pull" and run gocryptfs -speed again? I have enabled the xchacha20 benchmark in the master branch now.

@rfjakob

root@q1:~/go/src/github.com/rfjakob/gocryptfs# ./gocryptfs -speed
gocryptfs v1.7.1-46-g73436d9; go-fuse v1.0.1-0.20190319092520-161a16484456; 2020-04-13 go1.14.2 linux/arm64
AES-GCM-256-OpenSSL      212.30 MB/s    (selected in auto mode)
AES-GCM-256-Go           452.30 MB/s
AES-SIV-512-Go           100.25 MB/s
XChaCha20-Poly1305-Go    137.35 MB/s

Thanks, very interesting. Looks like
(1) the ARMv8 crypto extensions beat the socks off XChaCha20-Poly1305
(2) gocryptfs needs to learn to prefer AES-GCM-256-Go when the CPU has it

I just want to bring back a single point, I proposed chacha20-poy1305 for devices that do not have crypto-extensions, so armv8 devices are not part of that bunch.

But there are a lot of armv6 & armv7 devices out that, that might still benefit from either asm/go tuned versions of chacha20 or a link to a native openssl version of it.

btw I run the gocryptfs on OrangePi One (32bit, sun8i, 4-core, 1 GHz, AllWinner H3 SoC) and it would be nice to speed up a bit as on best it gets to 12MBps which is IMHO similar to RaspberryPI2, but when I tested it with chacha20-poy1305 binary, it got only half the speed which is not what everyone else here is reporting:

... kernel ... 5.4.45-sunxi ...
model name      : ARMv7 Processor rev 5 (v7l)
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
...
$ gocryptfs -speed
AES-GCM-256-OpenSSL       12.65 MB/s    (selected in auto mode)
AES-GCM-256-Go             3.64 MB/s
AES-SIV-512-Go             3.08 MB/s

compared to custom binary:

$ ./gocryptfs.xchacha20.armv7 -speed
AES-GCM-256-OpenSSL         N/A 
AES-GCM-256-Go             3.26 MB/s    (selected in auto mode)
AES-SIV-512-Go             3.06 MB/s    
XChaCha20-Poly1305-Go      6.80 MB/s    

my (old) intel pc could really speed up with XChaCha20-Poly1305-Go:

model name  : Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority dtherm

$gocryptfs -speed
gocryptfs v1.8.0-35-g274e0d2; go-fuse v2.0.3; 2020-06-01 go1.14.3 linux/amd64
AES-GCM-256-OpenSSL       94.85 MB/s    (selected in auto mode)
AES-GCM-256-Go            40.20 MB/s    
AES-SIV-512-Go            32.05 MB/s    
XChaCha20-Poly1305-Go    328.54 MB/s    
~$ lscpu | grep -E 'Arch|Model |Flags'
Architecture:                    x86_64
Model name:                      Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d

~$ gocryptfs -speed
gocryptfs v1.8.0.HEAD; go-fuse v1.0.1-0.20190319092520-161a16484456; 2020-07-13 go1.14.4 linux/amd64
AES-GCM-256-OpenSSL      132.23 MB/s    (selected in auto mode)
AES-GCM-256-Go            40.79 MB/s    
AES-SIV-512-Go            31.38 MB/s    
XChaCha20-Poly1305-Go    412.11 MB/s
~$ lscpu | grep -E 'Arch|Model |Flags'
Architecture:        aarch64
Model name:          Cortex-A53
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32

~$ gocryptfs -speed
gocryptfs v1.8.0.HEAD; go-fuse v1.0.1-0.20190319092520-161a16484456; 2020-07-13 go1.13.7 linux/arm64
AES-GCM-256-OpenSSL       32.41 MB/s
AES-GCM-256-Go           507.32 MB/s    (selected in auto mode)
AES-SIV-512-Go            54.46 MB/s
XChaCha20-Poly1305-Go    141.12 MB/s

FWIW, here are results from an Intel Atom N2800 (common in Kimsufi low-end dedicated servers)

~$ lscpu | grep -E 'Arch|Model |Flags'
Architecture:        x86_64
Model name:          Intel(R) Atom(TM) CPU N2800   @ 1.86GHz
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts nopl nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat

~$ ./gocryptfs -speed
gocryptfs v1.8.0-39-g3b61244; go-fuse v2.0.3; 2020-07-29 go1.14 linux/amd64
AES-GCM-256-OpenSSL       15.53 MB/s    (selected in auto mode)
AES-GCM-256-Go            10.58 MB/s
AES-SIV-512-Go             7.39 MB/s
XChaCha20-Poly1305-Go     78.46 MB/s

Clear win for XChaCha20-Poly1305-Go

Edit: I set up dm-crypt/luks instead on the same machine with same filesystem - getting 66MB/s from serpent xts 512, and takes ~1 min/~15 secs vs ~10 min/~80 secs to respectively untar and then delete linux kernel source. Real world perf for my use case is a similar 5-10x speedup. It seems like chacha support could go a long way to closing this performance gap

Was this page helpful?
0 / 5 - 0 ratings

Related issues

diegoarioza picture diegoarioza  Â·  4Comments

Suika picture Suika  Â·  3Comments

pepa65 picture pepa65  Â·  8Comments

ccchan234 picture ccchan234  Â·  8Comments

brainchild0 picture brainchild0  Â·  5Comments