Zstd: Slow decompression of large files

Created on 21 Mar 2017  Â·  20Comments  Â·  Source: facebook/zstd

On my Mac (1.1.4,1.1.3 installed using homebrew), I see the following times for a really large heap dump (8.1G) . On a smaller 1.5G tsv file, the times are less pronounced ( 3 seconds for gzip, 5 for zstd).
system time is really high.

8.1G Mar 13  tomcat.hprof
--
» time gzip -k tomcat.hprof                                                                                         
gzip -k tomcat.hprof  260.87s user 6.86s system 98% cpu 4:30.49 total
--
» time zstd -5 tomcat.hprof                                                       
tomcat.hprof           : 15.03%   (8712704476 => 1309452317 bytes, tomcat.hprof.zst)
zstd -5 tomcat.hprof  54.53s user 7.05s system 98% cpu 1:02.66 total
=======

» time zstd -d tomcat.hprof.zst -o zst.hprof                                       
tomcat.hprof.zst      : 8712704476 bytes
zstd -d tomcat.hprof.zst -o zst.hprof  12.41s user 76.07s system 96% cpu 1:31.82 total

=======

» time gzip -d -c tomcat.hprof.gz > gzip.hprof                                    
gzip -d -c tomcat.hprof.gz > gzip.hprof  15.06s user 4.20s system 92% cpu 20.855 total

Update:

» time zstd -d tomcat.hprof.zst -c > zstd.hprof                                      
tomcat.hprof.zst      : 8712704476 bytes
zstd -d tomcat.hprof.zst -c > zstd.hprof  12.32s user 9.83s system 94% cpu 23.354 total

Most helpful comment

The datagen user and system times are what I expect, but I don't know why the total time balloons when sparse mode is enabled.

In --no-sparse mode, Instruments shows that 83% of the time is spent in fwrite().
In --sparse mode, Instruments shows that 41% of the time is spent in fwrite(), 11% in fseek() and 33% in fclose().

The delay you are seeing before printing the final stats is the time spent in fclose(), I guess because the large fseek()s need to be committed.

I forgot to try to reproduce on my machine without full disk encryption last night, so I will try tonight.

All 20 comments

You might have something else running in the background, stealing away some bandwidth or processing power.

Things I would recommend for a more precise comparison :

  • Check for other processes activities (top -u), ensure nothing important run at the same time.
  • decode into /dev/null to remove any dependency on the local disk.

Local disks, wether SSD or HDD, can be capricious, and have certain sections working better than others (for different sets of reasons). You probably don't want to completely wipe out its content in order to ensure a neutral effect of storage device. /dev/null is shortcut to nullify this variable impact.

Expectation : on /dev/null, zstd decompression should be faster than gzip, as only cpu and RAM matters.
But on a real storage device, zstd decompression is likely throttled by storage bandwidth. If that is the case, difference between the 2 algorithms will be smaller.

I made sure nothing was running at any time (killed browser too). I disabled SIP to be extra cautious.

It's a mid-2014 mbp (15") with i7,16g ram and 500GB ssd. To be safe, I freed up about 70GB for testing.

Test with /dev/null

» time gzip -d -c usva.hprof.gz > /dev/null                                                                                                             
gzip -d -c usva.hprof.gz > /dev/null  19.38s user 0.58s system 99% cpu 19.980 total

time zstd -d usva.hprof.zst -c > /dev/null                                                                                                           
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -c > /dev/null  15.36s user 0.50s system 99% cpu 15.874 total

Filesystem                          Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1                         465Gi  409Gi   56Gi    88% 2705196 4292262083    0%   /

PS: I ran zstd -d -c usva.hprof.zst > zstd.hprof and zstd -d usva.hprof.zst -o zstd.hprof a few times more, and found that -c is always faster than -o for me.

E.g.

» time zstd -d -c usva.hprof.zst > zstd.hprof                                                                                                           
usva.hprof.zst      : 8712704476 bytes
zstd -d -c usva.hprof.zst > zstd.hprof  15.97s user 11.67s system 96% cpu 28.557 total

 » time zstd -d  usva.hprof.zst -o zstd.hprof                                                                                                            
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o zstd.hprof  16.70s user 78.43s system 93% cpu 1:41.81 total

Interesting !
Could it be related to the impact of sparse mode ?
It would be surprising, but it seems worthwhile to test, since it's one of the differences between stdout and direct file access. It also makes it interesting to know a bit more about which file system is being used.

To test it :
time zstd -d usva.hprof.zst -o zstd.hprof --no-sparse

_Edit_ :
For information, I tested on a local Mac laptop, with a similar configuration, using a 5 GB test file, but I could not distinguish any speed difference between -d -o and -d -c.

I ran these now immediately one after another.
One eyeball observation I make is that both kick off quick enough, but -c or --no-sparse maintain the decode speed, just -o doesnt.

~/Downloads/prod » time zstd -d usva.hprof.zst -o zstd.hprof --no-sparse                                                                                                           
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o zstd.hprof --no-sparse  12.52s user 9.80s system 92% cpu 24.154 total
------------------------------------------------------------
~/Downloads/prod » time zstd -d usva.hprof.zst -o zstd.hprof.2                                                                                                                     
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o zstd.hprof.2  12.11s user 79.70s system 96% cpu 1:35.00 total
------------------------------------------------------------
~/Downloads/prod » time zstd -d usva.hprof.zst -o zstd.hprof.3 --no-sparse                                                                                                         
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o zstd.hprof.3 --no-sparse  12.39s user 9.45s system 95% cpu 22.853 total

~/Downloads/prod » time zstd -d usva.hprof.zst -o zstd.hprof.4                                                                                                                    
Decoded : 7065 MB...
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o zstd.hprof.4  12.29s user 78.61s system 94% cpu 1:35.92 total

Disk Utility gives this info:

Volume name : Macintosh HD
Volume type : Logical Volume
BSD device node : disk1
Mount point : /
File system : Mac OS Extended (Journaled)
Connection : PCI
Device tree path : IODeviceTree:/PCI0@0/RP05@1C,4/SSD0@0/PRT0@0/PMP@0
Writable : Yes
Is case-sensitive : No
File system UUID : 6D6DD083-B13A-300D-B937-C6F05C59FC02
CoreStorage UUID : 5CF07A43-14BE-4248-9177-BC98D17704CE
Parent CoreStorage LVG UUID : DF884054-BAB9-4C27-B06C-115501AE5697
Volume capacity : 499,046,809,600
Available space (Purgeable + Free) : 281,825,002,752
Purgeable space : 229,518,611,712
Free space : 52,306,391,040
Used space : 217,221,806,848
File count : 2,309,009
Owners enabled : Yes
Is encrypted : No
System Integrity Protection supported : Yes
Can be verified : Yes
Can be repaired : No
Bootable : Yes
Journaled : Yes
Disk number : 1
Media name : Macintosh HD
Media type : Generic
Ejectable : No
Solid state : Yes
S.M.A.R.T. status : Not Supported
Parent disks : disk0s2

It's interesting.

I made some more tests on this topic.
First, some results on a Linux dev station, with an ext4 file system supporting sparse mode.
For this test, I created a 5 GB sparse sample file with datagen :
./datagen -P100 -g5GB > tmp5GB

Then compressed it in both .zst and .gz formats, and compared decompression time.
I'm using zstd v1.1.4, which is able to decompress .gz files, so this variant is compared too.

1) on /dev/null :

time zstd -d tmp5g.zst -c > /dev/null
real    0m1.579s
user    0m1.545s
sys     0m0.032s

time zstd -d tmp5g.gz -c > /dev/null
real    0m9.721s
user    0m9.678s
sys     0m0.032s

time gzip -d tmp5g.gz -c > /dev/null
real    0m18.947s
user    0m18.867s
sys     0m0.059s

No FS nor SSD impact on this one.
Ranking and performance are within expectation.
*.gz decoding is faster with zstd than gzip, which is more surprising.
This could be because zstd uses a recent zlib library for *.gz file, while the pre-built version of gzip, surprisingly, tends to not link to system's libz, using its own source code instead.

2) Next stage, let's compress on disk :

time zstd -d -f tmp5g.zst -o tmp5g
real    0m2.404s
user    0m1.988s
sys     0m0.239s

time zstd -d -f tmp5g.gz -o tmp5g
real    1m8.993s
user    0m9.830s
sys     0m5.434s

time gzip -d -f tmp5g.gz
real    0m23.544s
user    0m18.715s
sys     0m4.685s

Quite a big difference. Here zstd only needs 2.4s to decompress the 5GB file.
But the same operation with *.gz format takes 1mn08s ! This is way too large a difference.
gzip -d takes a hit too, though much less dramatic.

But there is something else which is suspicious : zstd was the first to be called.
Further invocations had to rewrite the same file ... on top of an existing one.

So, what happens if we re-run zstd now ?

time ../zstd -d tmp5g.zst -f -o tmp5g
real    0m27.511s
user    0m2.058s
sys     0m1.177s

Right, not that fast.
Let's redo those tests, but let's erase the file before :

rm tmpFile ; time ../zstd -d tmp5g.zst -f -o tmp5g
real    0m2.292s
user    0m2.034s
sys     0m0.253s

rm tmpFile ; time ../zstd -d tmp5g.gz -f -o tmp5g
real    0m13.379s
user    0m9.458s
sys     0m3.902s

rm tmpFile ; time gzip -d -f tmp5g.gz
real    0m23.508s
user    0m18.875s
sys     0m4.598s

Yes, much better, in line with expectation.

It turns out, ___rewriting file on top of an existing file (with same name) is much costlier operations___ .
This is probably made worse when first file is created with sparse mode turned on, as the File System needs to work quite a bit more.

Next to do : same tests on Mac OS X

Let's redo the test on a laptop with Mac OS X.

The Mac OS X File System is not capable to support sparse files.
So the impact of --[no-]sparse should be invisible.
That seems not supported by @mailmaldi tests.
Let's see.

As usual, let's start with a /dev/null baseline :

time zstd -d -f tmp5g.zst -c > /dev/null
real    0m1.074s
user    0m1.016s
sys     0m0.047s

time zstd -d -f tmp5g.gz -c > /dev/null
real    0m8.658s
user    0m8.592s
sys     0m0.048s

time gzip -d -f tmp5g.gz -c > /dev/null
real    0m8.665s
user    0m8.599s
sys     0m0.046s

This is similar to previous test, save that now gzip and zstd have same speed when decoding *.gz file.

Now let's try to decompress and write the result on disk.
Let's start with a pristine state : tmp5g does not exist before decompression :

rm tmp5g ; time zstd -d -f tmp5g.zst -o tmp5g
real    0m8.164s
user    0m1.371s
sys     0m4.446s

rm tmp5g ; time zstd -d -f tmp5g.gz -o tmp5g
real    0m15.888s
user    0m8.935s
sys     0m5.995s

rm tmp5g ; time gzip -d -f tmp5g.gz
real    0m12.301s
user    0m8.979s
sys     0m2.415s

zstd suffers from higher system times than gzip, and it's not clear why. But other than that, everything feels correct.

Let's replay, but keeping tmp5g in order to rewrite on top of it :

time zstd -d -f tmp5g.zst -o tmp5g
real    0m8.542s
user    0m1.394s
sys     0m4.657s

time zstd -d -f tmp5g.gz -o tmp5g
real    0m15.854s
user    0m8.927s
sys     0m5.941s

time gzip -d -f tmp5g.gz
real    0m12.210s
user    0m8.963s
sys     0m2.403s

Basically, same performance figures.
No impact when file already exist.
And no "strong degradations" as observed on Linux' Ext4FS.

The only issue which feels suspicious is that system times are higher for zstd.
But nowhere near as dramatic as your experiments.
Could it be improved with --no-sparse ?

rm tmp5g ; time ../zstd -d -f tmp5g.zst -o tmp5g --no-sparse
real    0m9.017s
user    0m1.391s
sys     0m6.497s

rm tmp5g ; time ../zstd -d -f tmp5g.gz -o tmp5g --no-sparse
real    0m15.964s
user    0m8.946s
sys     0m6.084s

Nope, same timings.

I'm at loss to explain the differences with @mailmaldi 's experiment.

For information, SSD Disk Utility :

Volume name : Macintosh HD
Volume type : Logical Volume
BSD device node : disk1
Mount point : /
File system : Mac OS Extended (Journaled, Encrypted)
Connection : PCI
Device tree path : IODeviceTree:/PCI0@0/RP06@1C,5/SSD0@0/PRT0@0/PMP@0
Writable : Yes
Is case-sensitive : No

The only major difference in Encrypted. But I don't see how it could explain differences in speed.

Just to be safe, I ran Mac diagnostics and also made sure that Mac's disk utility reported no issues while performing first aid.
I'll run the same test on a few other Macs and post back here.

Don't know if this helps, but I enable csrutil --without dtrace and ran the following tests,

sudo dtruss zstd -d usva.hprof.gz -o gzip.hprof.1  |& tee dtruss.gzip.txt
rm gzip.hprof.1
sudo dtruss zstd -d usva.hprof.zst -o zstd.hprof.1  |& tee dtruss.normal.txt
rm zstd.hprof.1
sudo dtruss zstd -d usva.hprof.zst -o zstd.hprof.1 --no-sparse |& tee dtruss.nosparse.txt
rm zstd.hprof.1

For each file, I ran cut -d"(" -f1 dtruss.nosparse.txt | sort | cut -d":" -f1| sort | uniq -c
The most called syscalls for each are:

Gzip:
58531 write_nocancel
312 read_nocancel
27 dtrace

zstd with -o and --no-sparse:
1547 getrusage
1477 read_nocancel
49575 write_nocancel

normal zstd -o:
3515 getrusage
27213 lseek
3186 read_nocancel
107367 write_nocancel

Indeed, the number of write and seek operations is much higher in the --sparse mode.
The --no-sparse mode only writes blocks, while the --sparse one may break a few blocks into smaller parts to extract some portion as sparse, using fseek().

I guess there is a potential here to reduce that a bit, keeping only the "long" fseek() operations, and joining neighboring fwrite() ones.

I would be surprised though if it was a primary reason for the speed difference. I expect it to be negligible. It seems confirmed in my tests by the fact that --sparse or --no-sparse makes no difference.

As an experiment,
I created a patch in a new branch, lessSparse,
which reduces the number of fseek() in --sparse mode.

I don't expect the impact to be dramatic, but well, it shouldn't be detrimental either.

As you expected, no difference
./zstd -> compiled from lessSparse zstd 1.1.4 installed

~/Downloads/prod » time ./zstd -d usva.hprof.zst -c > zstd.hprof.3              
usva.hprof.zst      : 8712704476 bytes
./zstd -d usva.hprof.zst -c > zstd.hprof.3  12.71s user 9.92s system 94% cpu 23.941 total
------------------------------------------------------------
~/Downloads/prod » time ./zstd -d usva.hprof.zst -o zstd.hprof.4                
usva.hprof.zst      : 8712704476 bytes
./zstd -d usva.hprof.zst -o zstd.hprof.4  12.31s user 73.61s system 96% cpu 1:28.97 total
------------------------------------------------------------
~/Downloads/prod » time ./zstd -d usva.hprof.zst -o zstd.hprof.5 --no-sparse    
usva.hprof.zst      : 8712704476 bytes
./zstd -d usva.hprof.zst -o zstd.hprof.5 --no-sparse  12.63s user 9.67s system 94% cpu 23.684 total
------------------------------------------------------------
~/Downloads/prod » time zstd -d usva.hprof.zst -o zstd.hprof.6 --no-sparse      
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o zstd.hprof.6 --no-sparse  12.30s user 9.61s system 94% cpu 23.276 total
------------------------------------------------------------
~/Downloads/prod » time zstd -d usva.hprof.zst -o zstd.hprof.7                  
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o zstd.hprof.7  12.05s user 77.33s system 96% cpu 1:33.04 total

Yes, this is disappointing.
Without ability to reproduce your observation, it's very difficult to fix anything.

I think we should first concentrate on finding a way to reproduce it.

I'm looking into two causes I want to look into:

  1. Differences in setup:

    • Which version of OS X are you using?

    • @Cyan4973 and I are both testing with disk encryption on, I will run a test with disk encryption off.

  2. Differences in input files:

    • Maybe your file is an edge case. Can you run make -C tests datagen && tests/datagen -P100 -g10GB > sparse.10GB and test with sparse.10GB?

I was running 10.12.3 till yesterday, 10.12.4 now.

In test runs 2 & 3 below (normal -o) , zstd completes the decoding of 10737418240 bytes and then just waits for 10 more seconds before printing the final stats.

1.
~/Downloads/prod » time zstd -d sparse.10GB.zst -o test.1 --no-sparse           
sparse.10GB.zst     : 10737418240 bytes
zstd -d sparse.10GB.zst -o test.1 --no-sparse  4.08s user 10.91s system 90% cpu 16.525 total
------------------------------------------------------------
2.
~/Downloads/prod » time zstd -d sparse.10GB.zst -o test.2                       
sparse.10GB.zst     : 10737418240 bytes
zstd -d sparse.10GB.zst -o test.2  4.41s user 11.54s system 59% cpu 26.905 total
------------------------------------------------------------
3.
~/Downloads/prod » time zstd -d sparse.10GB.zst -o test.3                       
sparse.10GB.zst     : 10737418240 bytes
zstd -d sparse.10GB.zst -o test.3  4.41s user 11.60s system 58% cpu 27.165 total                                                                                                                                                                                                            mpatil@C02N6207G3QD-0817
------------------------------------------------------------
4.
~/Downloads/prod » time zstd -d sparse.10GB.zst -c >  test.4                    
sparse.10GB.zst     : 10737418240 bytes
zstd -d sparse.10GB.zst -c > test.4  4.09s user 11.02s system 92% cpu 16.331 total

PS: will try and run on a bunch of other Macs today to discount a faulty laptop potentially.

Ran on a 15" mid-2014 mbp running Yosemite 10.10.5

➜  test time zstd -d usva.gz -o usva.hprof
usva.gz             : 8712704476 bytes
zstd -d usva.gz -o usva.hprof  15.69s user 7.37s system 97% cpu 23.763 total
--------------------------------------------------------------------------------
➜  test time zstd -5 usva.hprof -o usva.hprof.zst
usva.hprof           : 15.03%   (8712704476 => 1309452317 bytes, usva.hprof.zst)
zstd -5 usva.hprof -o usva.hprof.zst  45.28s user 2.73s system 99% cpu 48.247 total
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
➜  test time zstd -d usva.hprof.zst -o test.hprof.1
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o test.hprof.1  13.31s user 20.76s system 94% cpu 36.217 total
--------------------------------------------------------------------------------
➜  test time zstd -d usva.hprof.zst -o test.hprof.2 --no-sparse
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o test.hprof.2 --no-sparse  13.25s user 7.88s system 96% cpu 21.940 total
--------------------------------------------------------------------------------
➜  test time zstd -d -c usva.hprof.zst > test.hprof.3
usva.hprof.zst      : 8712704476 bytes
zstd -d -c usva.hprof.zst > test.hprof.3  13.41s user 7.84s system 94% cpu 22.578 total
--------------------------------------------------------------------------------
rm test.*
--------------------------------------------------------------------------------
➜  test time zstd -d usva.hprof.zst -o test.hprof.1
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o test.hprof.1  13.17s user 21.20s system 94% cpu 36.511 total
--------------------------------------------------------------------------------
➜  test time zstd -d usva.hprof.zst -o test.hprof.2 --no-sparse
usva.hprof.zst      : 8712704476 bytes
zstd -d usva.hprof.zst -o test.hprof.2 --no-sparse  13.26s user 7.92s system 95% cpu 22.138 total
--------------------------------------------------------------------------------
➜  test time zstd -d -c usva.hprof.zst > test.hprof.3
usva.hprof.zst      : 8712704476 bytes
zstd -d -c usva.hprof.zst > test.hprof.3  13.39s user 7.95s system 95% cpu 22.433 total
--------------------------------------------------------------------------------

Then I ran with the lessSparse compiled binary. -o so bad now.

➜  test time ./zstd -d usva.hprof.zst -o test.hprof.1
usva.hprof.zst      : 8712704476 bytes
./zstd -d usva.hprof.zst -o test.hprof.1  12.89s user 106.93s system 96% cpu 2:03.69 total
--------------------------------------------------------------------------------
➜  test time ./zstd -d usva.hprof.zst -o test.hprof.2 --no-sparse
usva.hprof.zst      : 8712704476 bytes
./zstd -d usva.hprof.zst -o test.hprof.2 --no-sparse  12.99s user 8.27s system 97% cpu 21.912 total
--------------------------------------------------------------------------------
➜  test rm test.hprof.*
--------------------------------------------------------------------------------
➜  test time ./zstd -d usva.hprof.zst -o test.hprof.1
usva.hprof.zst      : 8712704476 bytes
./zstd -d usva.hprof.zst -o test.hprof.1  12.74s user 99.09s system 97% cpu 1:54.69 total
--------------------------------------------------------------------------------
➜  test time ./zstd -d -c usva.hprof.zst > test.hprof.3
usva.hprof.zst      : 8712704476 bytes
./zstd -d -c usva.hprof.zst > test.hprof.3  12.79s user 8.44s system 96% cpu 22.104 total
--------------------------------------------------------------------------------
➜  test rm test.hprof.*
--------------------------------------------------------------------------------

Finally, used datagen produced file:

➜  source git:(dev) ✗ time zstd -d sparse.10GB.zst -o test.1 --no-sparse
sparse.10GB.zst     : 10737418240 bytes
zstd -d sparse.10GB.zst -o test.1 --no-sparse  4.71s user 10.31s system 94% cpu 15.814 total
--------------------------------------------------------------------------------
➜  source git:(dev) ✗ time zstd -d sparse.10GB.zst -o test.2
sparse.10GB.zst     : 10737418240 bytes
zstd -d sparse.10GB.zst -o test.2  4.73s user 16.70s system 68% cpu 31.130 total
--------------------------------------------------------------------------------
➜  source git:(dev) ✗ time zstd -d sparse.10GB.zst -o test.3
sparse.10GB.zst     : 10737418240 bytes
zstd -d sparse.10GB.zst -o test.3  4.63s user 16.52s system 64% cpu 32.941 total
--------------------------------------------------------------------------------
➜  source git:(dev) ✗ time zstd -d sparse.10GB.zst -c >  test.4
sparse.10GB.zst     : 10737418240 bytes
zstd -d sparse.10GB.zst -c > test.4  4.79s user 10.76s system 84% cpu 18.402 total
--------------------------------------------------------------------------------

The datagen user and system times are what I expect, but I don't know why the total time balloons when sparse mode is enabled.

In --no-sparse mode, Instruments shows that 83% of the time is spent in fwrite().
In --sparse mode, Instruments shows that 41% of the time is spent in fwrite(), 11% in fseek() and 33% in fclose().

The delay you are seeing before printing the final stats is the time spent in fclose(), I guess because the large fseek()s need to be committed.

I forgot to try to reproduce on my machine without full disk encryption last night, so I will try tonight.

Tried this on a personal mbp 13" mid-2015.

> time zstd -d usva.hprof.zst -o test.1 --no-sparse
usva.hprof.zst      : 8712704476 bytes
       34.09 real        15.23 user        13.20 sys
--------------------------------------------------------------------------------
> rm test.1; time zstd -d usva.hprof.zst -o test.1 --no-sparse
usva.hprof.zst      : 8712704476 bytes
       31.05 real        14.84 user        12.41 sys
--------------------------------------------------------------------------------
> rm test.1; time zstd -d usva.hprof.zst -o test.1 --no-sparse
usva.hprof.zst      : 8712704476 bytes
       31.07 real        14.76 user        12.56 sys
--------------------------------------------------------------------------------
> rm test.1; time zstd -d usva.hprof.zst -o test.1
usva.hprof.zst      : 8712704476 bytes
      108.10 real        16.46 user        82.75 sys
--------------------------------------------------------------------------------
> rm test.1; time zstd -d usva.hprof.zst -o test.1
usva.hprof.zst      : 8712704476 bytes
      130.31 real        16.92 user        87.16 sys
--------------------------------------------------------------------------------
> rm test.1; time zstd -d usva.hprof.zst -c > test.1
usva.hprof.zst      : 8712704476 bytes
       33.90 real        16.07 user        12.84 sys
--------------------------------------------------------------------------------

Available:  107.7 GB (107,703,328,768 bytes)
  Capacity: 249.8 GB (249,795,969,024 bytes)
  Mount Point:  /
  File System:  Journaled HFS+
  Writable: Yes
  Ignore Ownership: No
  BSD Name: disk1
  Volume UUID:  E71CCAB5-2022-349C-9FCE-83040D4552D5
  Logical Volume:
  Revertible:   Yes (unlock and decryption required)
  Encrypted:    Yes
  Encryption Type:  AES-XTS
  Locked:   No
  LV UUID:  2A088B59-9665-4C31-8600-B83F2FFB66BB
  Logical Volume Group:
  Name: Macintosh HD
  Size: 250.14 GB (250,140,434,432 bytes)
  Free Space:   Zero KB
  LVG UUID: 765C7D06-F766-4E00-A070-6307762304EA
  Physical Volumes:
disk0s2:
  Device Name:  APPLE SSD SM0256G
  Media Name:   APPLE SSD SM0256G Media
  Size: 250.14 GB (250,140,434,432 bytes)
  Medium Type:  SSD
  Protocol: PCI
  Internal: Yes
  Partition Map Type:   GPT (GUID Partition Table)
  Status:   Online
  S.M.A.R.T. Status:    Verified
  PV UUID:  30B94D50-5A38-42CA-A066-CAD19B2FFFEB
--------------------------------------------------------------------------------

Was at the apple store, so have this to report on a 15" Mid-2015 mbp as well. (git clone + make)

MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ tests/datagen -P100 -g10GB > sparse.10GB

MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ ./zstd -5 sparse.10GB -o sparse.test
sparse.10GB          :  0.01%   (10737418240 => 1096231 bytes, sparse.test)  
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ time ./zstd -d sparse.test -o sparse.1
sparse.test         : 10737418240 bytes                                        

real    0m20.795s
user    0m4.406s
sys 0m11.545s
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ time ./zstd -d sparse.test -o sparse.2 --no-sparse
sparse.test         : 10737418240 bytes                                        

real    0m16.006s
user    0m4.313s
sys 0m10.884s
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ time ./zstd -d sparse.test -c>  sparse.3 
sparse.test         : 10737418240 bytes                                        

real    0m16.338s
user    0m4.340s
sys 0m11.187s
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ time ./zstd -d sparse.test -o sparse.4
sparse.test         : 10737418240 bytes                                        

real    0m21.188s
user    0m4.415s
sys 0m11.198s
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ time ./zstd -5 sparse.10GB 
sparse.10GB          :  0.01%   (10737418240 => 1096231 bytes, sparse.10GB.zst) 

real    0m9.251s
user    0m4.906s
sys 0m3.862s
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ 
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ 
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ time ./zstd -d sparse.10GB.zst -o test1
sparse.10GB.zst     : 10737418240 bytes                                        

real    0m20.975s
user    0m4.393s
sys 0m11.871s
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ time ./zstd -d sparse.10GB.zst -o test2 --no-sparse
sparse.10GB.zst     : 10737418240 bytes                                        

real    0m16.347s
user    0m4.426s
sys 0m11.351s
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ time ./zstd -d sparse.10GB.zst -c >  test3 
sparse.10GB.zst     : 10737418240 bytes                                        

real    0m16.469s
user    0m4.445s
sys 0m11.257s
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ time ./zstd -d sparse.10GB.zst -o test4
sparse.10GB.zst     : 10737418240 bytes                                        

real    0m21.448s
user    0m4.412s
sys 0m11.929s
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ time ./zstd -d sparse.10GB.zst -o test5
sparse.10GB.zst     : 10737418240 bytes                                        

real    0m21.388s
user    0m4.440s
sys 0m12.115s
MacBook-Pro-C02PXT0FG8WN-ars168-54:zstd apple$ 

I've been able to reproduce the slower decompression speed on the datagen file on my Macbook without full disk encryption.

> time zstd -d sparse10GB.zst -o tmp1
sparse10GB.zst      : 10000000000 bytes
zstd -d sparse10GB.zst -o tmp6  3.01s user 16.79s system 45% cpu 43.533 total
> time zstd -d sparse10GB.zst -o tmp2 --no-sparse
sparse10GB.zst      : 10000000000 bytes
zstd -d sparse10GB.zst -o tmp2 --no-sparse  3.07s user 12.72s system 77% cpu 20.306 total
> time zstd -d sparse10GB.zst -c > tmp3
sparse10GB.zst      : 10000000000 bytes
zstd -d sparse10GB.zst -c > tmp3  3.13s user 13.06s system 72% cpu 22.418 total
> time gzip -dk sparse10GB.gzip.gz
gzip -dk sparse10GB.gzip.gz  19.80s user 5.28s system 82% cpu 30.446 total
> time gzip -dc sparse10GB.gzip.gz > tmp4
gzip -dc sparse10GB.gzip.gz > tmp4  19.77s user 4.67s system 83% cpu 29.313 total

Note that the test on the 13" mbp in the comment above yours was also on an encrypted disk
Encrypted: Yes Encryption Type: AES-XTS

OK,
it seems this was reproduced on several systems already, giving enough confirmation signal.
It's strange that this is not reproduced on _every_ Mac OS-X systems, but well, at least we know that the fix (--no-sparse) is not detrimental either.

I spotted some apparently subtle differences in reports, notably :
File system : Mac OS Extended (Journaled, Encrypted) : no slowdown
File System: Journaled HFS+ : slowdown with --sparse
So it _might_ be related ...

Anyway, the fix is simple enough : we should disable --sparse by default on Mac OS-X systems.
We can do it there : https://github.com/facebook/zstd/blob/dev/programs/fileio.c#L135

It will still be possible to force it, using explicit --sparse order, for use cases which would actually benefit from it (such as mounting a ZFS on Mac, which is unsupported by Apple, but technically possible)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TheSil picture TheSil  Â·  3Comments

xorgy picture xorgy  Â·  3Comments

robert3005 picture robert3005  Â·  4Comments

scherepanov picture scherepanov  Â·  3Comments

sergeevabc picture sergeevabc  Â·  3Comments