zfs-0.7.9: Setting spa_slop_shift >= 64 results in zfs free space emergency, crashing open processes

Created on 9 Sep 2018  路  13Comments  路  Source: openzfs/zfs

With zfs-0.6.5.x on 64-bit Ubuntu 16.04 LTS, I was able to set the spa_slop_shift value as high as 65536 (didn't test higher).

However with source-built zfs-0.7.9, setting spa_slop_shift to any value greater than 63 causes an immediate zero-free-space condition across all pools (re-setting spa_slop_shift to 63 or less releases the zero free-space condition).

Is this by (new) design?


Ubuntu Xenial (16.04 LTS)
Kernel 4.4.0-57 x86-64

Version:
source-built 0.7.9-3ubuntu4~16.04.york0

All 13 comments

This sounds like a bug. We should update the module option to apply some reasonable limits.

There should be a check on its value. The unit is int, so although it can vary depending on the size of int, the practical limit is on the order of 15. The reasoning is:

  • min slop space size (lower bound) = 128 MiB
  • if RAM = 4TB and spa_slop_shift =15, then the limit is set to 128 MiB

https://github.com/zfsonlinux/zfs/wiki/ZFS-on-Linux-Module-Parameters#spa_slop_shift

richardelling, does that mean that the practical upper limit is approx spa_slop_shift=22 for a system with 288~TB~ GB of RAM?

If so, @loli10K shouldn't we set the limit to something higher than 15, such as 31, and maybe return to having zfs gracefully treat higher-than-practical limits so that if the slop_shift equation yielded less than 128MB, it would be disregarded and 128MB would be used? Wasn't that the pre-0.7.0 behavior?

does anyone have a system with 288TB of RAM? I鈥檇 bet a steak dinner that Linux won鈥檛 handle that very well...

Currently there is no test when setting the value. So the first order of business would be to write the updater function.

Right... sorry about that; I meant 288 GB (changed above, steak dinner to be arranged later).

return to having zfs gracefully treat higher-than-practical limits so that if the slop_shift equation yielded less than 128MB, it would be disregarded and 128MB would be used? Wasn't that the pre-0.7.0 behavior?

I don't see this behavior in any ZFS version i have tested so far, the overflow has likely been in the source for quite some time; consider the following commands:

POOLNAME='testpool'
TMPDIR='/var/tmp'
mountpoint -q $TMPDIR || mount -t tmpfs tmpfs $TMPDIR -o size=131072G
zpool destroy $POOLNAME
rm -f $TMPDIR/disk
truncate -s 128T /var/tmp/disk
zpool create $POOLNAME $TMPDIR/disk

cat > ./issue-7876.sh <<'EOF'
#!/bin/bash
for i in `seq 0 33` `seq 63 73` `seq 127 137` `seq 65535 65545`
do
   echo $i > /sys/module/zfs/parameters/spa_slop_shift
   df /testpool > /dev/null
done
EOF
chmod +x ./issue-7876.sh

stap -d zfs -e '
probe module("zfs").function("spa_get_slop_space").return
{
   printf("spa_slop_shift=%d, spa_get_slop_space=%lu\n", @entry($spa_slop_shift), ($return));
}
' -c ./issue-7876.sh

0.6.5.11:

spa_slop_shift=0, spa_get_slop_space=139637976727552
spa_slop_shift=1, spa_get_slop_space=69818988363776
spa_slop_shift=2, spa_get_slop_space=34909494181888
spa_slop_shift=3, spa_get_slop_space=17454747090944
spa_slop_shift=4, spa_get_slop_space=8727373545472
spa_slop_shift=5, spa_get_slop_space=4363686772736
spa_slop_shift=6, spa_get_slop_space=2181843386368
spa_slop_shift=7, spa_get_slop_space=1090921693184
spa_slop_shift=8, spa_get_slop_space=545460846592
spa_slop_shift=9, spa_get_slop_space=272730423296
spa_slop_shift=10, spa_get_slop_space=136365211648
spa_slop_shift=11, spa_get_slop_space=68182605824
spa_slop_shift=12, spa_get_slop_space=34091302912
spa_slop_shift=13, spa_get_slop_space=17045651456
spa_slop_shift=14, spa_get_slop_space=8522825728
spa_slop_shift=15, spa_get_slop_space=4261412864
spa_slop_shift=16, spa_get_slop_space=2130706432
spa_slop_shift=17, spa_get_slop_space=1065353216
spa_slop_shift=18, spa_get_slop_space=532676608
spa_slop_shift=19, spa_get_slop_space=266338304
spa_slop_shift=20, spa_get_slop_space=133169152
spa_slop_shift=21, spa_get_slop_space=66584576
spa_slop_shift=22, spa_get_slop_space=33554432
spa_slop_shift=23, spa_get_slop_space=33554432
spa_slop_shift=24, spa_get_slop_space=33554432
spa_slop_shift=25, spa_get_slop_space=33554432
spa_slop_shift=26, spa_get_slop_space=33554432
spa_slop_shift=27, spa_get_slop_space=33554432
spa_slop_shift=28, spa_get_slop_space=33554432
spa_slop_shift=29, spa_get_slop_space=33554432
spa_slop_shift=30, spa_get_slop_space=33554432
spa_slop_shift=31, spa_get_slop_space=33554432
spa_slop_shift=32, spa_get_slop_space=33554432
spa_slop_shift=33, spa_get_slop_space=33554432
spa_slop_shift=63, spa_get_slop_space=33554432
spa_slop_shift=64, spa_get_slop_space=139637976727552
spa_slop_shift=65, spa_get_slop_space=69818988363776
spa_slop_shift=66, spa_get_slop_space=34909494181888
spa_slop_shift=67, spa_get_slop_space=17454747090944
spa_slop_shift=68, spa_get_slop_space=8727373545472
spa_slop_shift=69, spa_get_slop_space=4363686772736
spa_slop_shift=70, spa_get_slop_space=2181843386368
spa_slop_shift=71, spa_get_slop_space=1090921693184
spa_slop_shift=72, spa_get_slop_space=545460846592
spa_slop_shift=73, spa_get_slop_space=272730423296
spa_slop_shift=127, spa_get_slop_space=33554432
spa_slop_shift=128, spa_get_slop_space=139637976727552
spa_slop_shift=129, spa_get_slop_space=69818988363776
spa_slop_shift=130, spa_get_slop_space=34909494181888
spa_slop_shift=131, spa_get_slop_space=17454747090944
spa_slop_shift=132, spa_get_slop_space=8727373545472
spa_slop_shift=133, spa_get_slop_space=4363686772736
spa_slop_shift=134, spa_get_slop_space=2181843386368
spa_slop_shift=135, spa_get_slop_space=1090921693184
spa_slop_shift=136, spa_get_slop_space=545460846592
spa_slop_shift=137, spa_get_slop_space=272730423296
spa_slop_shift=65535, spa_get_slop_space=33554432
spa_slop_shift=65536, spa_get_slop_space=139637976727552
spa_slop_shift=65537, spa_get_slop_space=69818988363776
spa_slop_shift=65538, spa_get_slop_space=34909494181888
spa_slop_shift=65539, spa_get_slop_space=17454747090944
spa_slop_shift=65540, spa_get_slop_space=8727373545472
spa_slop_shift=65541, spa_get_slop_space=4363686772736
spa_slop_shift=65542, spa_get_slop_space=2181843386368
spa_slop_shift=65543, spa_get_slop_space=1090921693184
spa_slop_shift=65544, spa_get_slop_space=545460846592
spa_slop_shift=65545, spa_get_slop_space=272730423296

0.7.9:

spa_slop_shift=0, spa_get_slop_space=139637976727552
spa_slop_shift=1, spa_get_slop_space=69818988363776
spa_slop_shift=2, spa_get_slop_space=34909494181888
spa_slop_shift=3, spa_get_slop_space=17454747090944
spa_slop_shift=4, spa_get_slop_space=8727373545472
spa_slop_shift=5, spa_get_slop_space=4363686772736
spa_slop_shift=6, spa_get_slop_space=2181843386368
spa_slop_shift=7, spa_get_slop_space=1090921693184
spa_slop_shift=8, spa_get_slop_space=545460846592
spa_slop_shift=9, spa_get_slop_space=272730423296
spa_slop_shift=10, spa_get_slop_space=136365211648
spa_slop_shift=11, spa_get_slop_space=68182605824
spa_slop_shift=12, spa_get_slop_space=34091302912
spa_slop_shift=13, spa_get_slop_space=17045651456
spa_slop_shift=14, spa_get_slop_space=8522825728
spa_slop_shift=15, spa_get_slop_space=4261412864
spa_slop_shift=16, spa_get_slop_space=2130706432
spa_slop_shift=17, spa_get_slop_space=1065353216
spa_slop_shift=18, spa_get_slop_space=532676608
spa_slop_shift=19, spa_get_slop_space=266338304
spa_slop_shift=20, spa_get_slop_space=134217728
spa_slop_shift=21, spa_get_slop_space=134217728
spa_slop_shift=22, spa_get_slop_space=134217728
spa_slop_shift=23, spa_get_slop_space=134217728
spa_slop_shift=24, spa_get_slop_space=134217728
spa_slop_shift=25, spa_get_slop_space=134217728
spa_slop_shift=26, spa_get_slop_space=134217728
spa_slop_shift=27, spa_get_slop_space=134217728
spa_slop_shift=28, spa_get_slop_space=134217728
spa_slop_shift=29, spa_get_slop_space=134217728
spa_slop_shift=30, spa_get_slop_space=134217728
spa_slop_shift=31, spa_get_slop_space=134217728
spa_slop_shift=32, spa_get_slop_space=134217728
spa_slop_shift=33, spa_get_slop_space=134217728
spa_slop_shift=63, spa_get_slop_space=134217728
spa_slop_shift=64, spa_get_slop_space=139637976727552
spa_slop_shift=65, spa_get_slop_space=69818988363776
spa_slop_shift=66, spa_get_slop_space=34909494181888
spa_slop_shift=67, spa_get_slop_space=17454747090944
spa_slop_shift=68, spa_get_slop_space=8727373545472
spa_slop_shift=69, spa_get_slop_space=4363686772736
spa_slop_shift=70, spa_get_slop_space=2181843386368
spa_slop_shift=71, spa_get_slop_space=1090921693184
spa_slop_shift=72, spa_get_slop_space=545460846592
spa_slop_shift=73, spa_get_slop_space=272730423296
spa_slop_shift=127, spa_get_slop_space=134217728
spa_slop_shift=128, spa_get_slop_space=139637976727552
spa_slop_shift=129, spa_get_slop_space=69818988363776
spa_slop_shift=130, spa_get_slop_space=34909494181888
spa_slop_shift=131, spa_get_slop_space=17454747090944
spa_slop_shift=132, spa_get_slop_space=8727373545472
spa_slop_shift=133, spa_get_slop_space=4363686772736
spa_slop_shift=134, spa_get_slop_space=2181843386368
spa_slop_shift=135, spa_get_slop_space=1090921693184
spa_slop_shift=136, spa_get_slop_space=545460846592
spa_slop_shift=137, spa_get_slop_space=272730423296
spa_slop_shift=65535, spa_get_slop_space=134217728
spa_slop_shift=65536, spa_get_slop_space=139637976727552
spa_slop_shift=65537, spa_get_slop_space=69818988363776
spa_slop_shift=65538, spa_get_slop_space=34909494181888
spa_slop_shift=65539, spa_get_slop_space=17454747090944
spa_slop_shift=65540, spa_get_slop_space=8727373545472
spa_slop_shift=65541, spa_get_slop_space=4363686772736
spa_slop_shift=65542, spa_get_slop_space=2181843386368
spa_slop_shift=65543, spa_get_slop_space=1090921693184
spa_slop_shift=65544, spa_get_slop_space=545460846592
spa_slop_shift=65545, spa_get_slop_space=272730423296

return to having zfs gracefully treat higher-than-practical limits so that if the slop_shift equation yielded less than 128MB, it would be disregarded and 128MB would be used? Wasn't that the pre-0.7.0 behavior?

I don't see this behavior in any ZFS version i have tested so far, the overflow has likely been in the source for quite some time; consider the following commands:
...

Hi loli10K,

What behavior are you not seeing? So on a system with 0.7.9, you are not seeing an out of free space condition occur on your zfs mounts when you #echo 64 > /sys/module/zfs/parameters/spa_slop_shift ?

When I do this, I can observe glances showing me at the bottom left that all of my zfs mounts suddenly have zero free space. Likewise my node.js processes that are accessing any of these zfs mounts suddenly crash.

I don't see any difference between 0.6.5.x and 0.7.x, so i don't think there's any "_pre-0.7.0 behavior_" to be restored. In my testing 0.6.5.11 did not have "_zfs gracefully treat higher-than-practical limits so that if the slop_shift equation yielded less than 128MB_" (also IIRC the old min slop size was 32M, not 128M)

Sorry, just trying to get on the same page... Are you saying that for your 0.6.5.11, zfs _didn't_ gracefully handle a spa_slop_shift value such as 65536, such that zfs _did_ misbehave?

I know that for me with 0.7.9, I can't use (without trouble)

64
32767
32768
65535
65536

But with 0.6.5.x, I could use the higher values

root@linux:~# zpool list 
NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   127T    64K   127T         -     0%     0%  1.00x  ONLINE  -
root@linux:~# modinfo -F version zfs
0.6.5.11-1
root@linux:~# stap -d zfs -e '
probe module("zfs").function("spa_get_slop_space").return
{
   printf("spa_slop_shift=%d, spa_get_slop_space=%lu\n", @entry($spa_slop_shift), ($return));
}
' -c bash
root@linux:~# echo 65535 > /sys/module/zfs/parameters/spa_slop_shift
root@linux:~# df /testpool/ > /dev/null 
root@linux:~# spa_slop_shift=65535, spa_get_slop_space=33554432

root@linux:~# echo 65536 > /sys/module/zfs/parameters/spa_slop_shift
root@linux:~# df /testpool/ > /dev/null 
root@linux:~# spa_slop_shift=65536, spa_get_slop_space=139637976727552

root@linux:~# 
  • slop_space with slop_shift=65535 is 32M
  • slop_space with slop_shift=65536 i think it's 127T

How is this _not_ misbehaving?

Oh wow. Ok, I have access to a 0.6.5.6 pool that I can test this with if you'd like... Otherwise we could just move forward from here.

Let's move forward an add sane limits to the kernel module parameter: the fix proposed in https://github.com/zfsonlinux/zfs/pull/7900 has already the code in place to enforce limits for the spa_slop_shift tunable, we just need to decide what the max value should be.

Running the previous script (https://github.com/zfsonlinux/zfs/issues/7876#issuecomment-421484958) on a 128PB pool shows values higher than 30 don't make any difference as slop space is always 128M:

root@linux:~# cat > ./issue-7876.sh <<'EOF'
> #!/bin/bash
> for i in `seq 0 63`
> do
>    echo $i > /sys/module/zfs/parameters/spa_slop_shift
>    df /testpool > /dev/null
> done
> EOF
root@linux:~# chmod +x ./issue-7876.sh
root@linux:~# 
root@linux:~# 
root@linux:~# stap -d zfs -e '
> probe module("zfs").function("spa_get_slop_space").return
> {
>    printf("spa_slop_shift=%d, spa_get_slop_space=%lu\n", @entry($spa_slop_shift), ($return));
> }
> ' -c ./issue-7876.sh
spa_slop_shift=0, spa_get_slop_space=142989288169013248
spa_slop_shift=1, spa_get_slop_space=71494644084506624
spa_slop_shift=2, spa_get_slop_space=35747322042253312
spa_slop_shift=3, spa_get_slop_space=17873661021126656
spa_slop_shift=4, spa_get_slop_space=8936830510563328
spa_slop_shift=5, spa_get_slop_space=4468415255281664
spa_slop_shift=6, spa_get_slop_space=2234207627640832
spa_slop_shift=7, spa_get_slop_space=1117103813820416
spa_slop_shift=8, spa_get_slop_space=558551906910208
spa_slop_shift=9, spa_get_slop_space=279275953455104
spa_slop_shift=10, spa_get_slop_space=139637976727552
spa_slop_shift=11, spa_get_slop_space=69818988363776
spa_slop_shift=12, spa_get_slop_space=34909494181888
spa_slop_shift=13, spa_get_slop_space=17454747090944
spa_slop_shift=14, spa_get_slop_space=8727373545472
spa_slop_shift=15, spa_get_slop_space=4363686772736
spa_slop_shift=16, spa_get_slop_space=2181843386368
spa_slop_shift=17, spa_get_slop_space=1090921693184
spa_slop_shift=18, spa_get_slop_space=545460846592
spa_slop_shift=19, spa_get_slop_space=272730423296
spa_slop_shift=20, spa_get_slop_space=136365211648
spa_slop_shift=21, spa_get_slop_space=68182605824
spa_slop_shift=22, spa_get_slop_space=34091302912
spa_slop_shift=23, spa_get_slop_space=17045651456
spa_slop_shift=24, spa_get_slop_space=8522825728
spa_slop_shift=25, spa_get_slop_space=4261412864
spa_slop_shift=26, spa_get_slop_space=2130706432
spa_slop_shift=27, spa_get_slop_space=1065353216
spa_slop_shift=28, spa_get_slop_space=532676608
spa_slop_shift=29, spa_get_slop_space=266338304
spa_slop_shift=30, spa_get_slop_space=134217728
spa_slop_shift=31, spa_get_slop_space=134217728
spa_slop_shift=32, spa_get_slop_space=134217728
spa_slop_shift=33, spa_get_slop_space=134217728
spa_slop_shift=63, spa_get_slop_space=134217728
# stopping at 63, 64 leads to overflow

So probably 31, unless we could think of why we'd want to super-future-proof it with a value up to 63

Was this page helpful?
0 / 5 - 0 ratings