I have experienced a silently corrupted snapshot by send/receive – a checksum of a single file on source and target do not match on the filesystem and on the snapshots.
Repeated scrubs of the tested pools do not show any errors.
Trying to replicate the source filesystem results in a modification of single file on the target pool, in the received filesystem and all snapshots.
Source pool is a 6x1TB RAIDZ2 array on Debian 8.2.0 (Jessie, kernel 3.16.0-4-amd64, installed from DVDs, no additional updates) with version 0.6.5.3 of ZFS/SPL built from source (standard configure and no patches).
My source pool (“nas1FS’) was created on a non-ECC RAM machine and after filling up with data through a samba share (standard samba server, on the source fs sharesmb=off) was moved to a different computer with ECC RAM (I have used the same operating system on both machines).
I can understand non-ECC RAM in the first computer causing permanent corruption of data that is visible on both source and target pool, but in this case the data is only silently changed during the transfer on a new computer with ECC RAM and source pool data seems to be fine.
This corruption of data during send/receive is repeatable.
To better explain what I have done:
First I have created a 6x1TB raidz2 pool on my old computer ("nas1FS").
After filling this pool with data I have moved the array from old computer to a new one and I have tried to backup the data on “nas1FS” pool to a different pool (“backup_raidz“).
“nas1FS” pool contained following snapshots that are of interest in this issue:
# zfs list -t snapshot |grep “nas1FS/backup@”
nas1FS/backup@20151121_1 59.6G - 726G -
nas1FS/backup@20151124 594M - 758G -
nas1FS/backup@20151129 945M - 758G -
nas1FS/backup@20151210 111M - 765G -
nas1FS/backup@20160220 134M - 772G -
nas1FS/backup@20160306 128M - 779G -
nas1FS/backup@20160306_1 27.7M - 780G -
nas1FS/backup@20160618 16.0K - 866G -
I have created a “backup_raidz” pool for backup (with compression turned on):
# zpool create -o ashift=12 -O compression=gzip-9 backup_raidz raidz1 /dev/disk/by-id/ata-SAMSUNG_HD103UJ_SerialNo1 /dev/disk/by-id/ata-SAMSUNG_HD103UJ_SerialNo2 /dev/disk/by-id/ata-HGST_HTS721010A9E630_SerialNo3 -f
Afterwards I have tried to replicate “nas1FS” pool.
# zfs send -R -vvvv "nas1FS@20160618" |zfs receive -vvvv -F "backup_raidz/nas1FS_bak"
this command finished successfully without any error.
I have executed following commands to get a list of file checksums on both source and target:
# md5deep -o f -e -r /backup_raidz/nas1FS_bak/backup/.zfs/snapshot/20160618/* > sums_1.txt
# md5deep -o f -e -r /nas1FS/backup/.zfs/snapshot/20160618/* > sums_2.txt
and compared the resulting files
I have found a checksum mismatch on a single file:
3178c03d3205ac148372a71d75a835ec /nas1FS/backup/.zfs/snapshot/20160618/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
9917ceb0f20ca6f23a06857327bd9940 /backup_raidz/nas1FS_bak/backup/.zfs/snapshot/20160618/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
Correct checksum is “3178c03d3205ac148372a71d75a835ec”, it was verified on the source used to populate the “nas1FS” filesystem.
This checksum mismatch was propagated through all snapshots in which the file was present on target pool:
9917ceb0f20ca6f23a06857327bd9940 /backup_raidz/nas1FS_bak/backup/.zfs/snapshot/20151124/samba_share/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
9917ceb0f20ca6f23a06857327bd9940 /backup_raidz/nas1FS_bak/backup/.zfs/snapshot/20151129/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
9917ceb0f20ca6f23a06857327bd9940 /backup_raidz/nas1FS_bak/backup/.zfs/snapshot/20151210/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
9917ceb0f20ca6f23a06857327bd9940 /backup_raidz/nas1FS_bak/backup/.zfs/snapshot/20160220/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
9917ceb0f20ca6f23a06857327bd9940 /backup_raidz/nas1FS_bak/backup/.zfs/snapshot/20160306/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
9917ceb0f20ca6f23a06857327bd9940 /backup_raidz/nas1FS_bak/backup/.zfs/snapshot/20160306_1/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
9917ceb0f20ca6f23a06857327bd9940 /backup_raidz/nas1FS_bak/backup/.zfs/snapshot/20160618/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
Source pool has shown correct checksum on all snapshot that the offending file was accessible
3178c03d3205ac148372a71d75a835ec /nas1FS/backup/.zfs/snapshot/20160618/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
3178c03d3205ac148372a71d75a835ec /nas1FS/backup/.zfs/snapshot/20160306_1/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
3178c03d3205ac148372a71d75a835ec /nas1FS/backup/.zfs/snapshot/20160306/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
3178c03d3205ac148372a71d75a835ec /nas1FS/backup/.zfs/snapshot/20160220/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
3178c03d3205ac148372a71d75a835ec /nas1FS/backup/.zfs/snapshot/20151210/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
3178c03d3205ac148372a71d75a835ec /nas1FS/backup/.zfs/snapshot/20151129/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
3178c03d3205ac148372a71d75a835ec /nas1FS/backup/.zfs/snapshot/20151124/samba_share/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
Trying to access this file on a snapshot when it did not exist (“backup@20151121_1”) results in a “No such file or directory” on target pool (“backup_raidz” or “backup_raidz_test”).
When I have tried to access offending file on “nas1FS” with a command:
# md5sum /nas1FS/backup/.zfs/snapshot/20151121_1/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
it resulted in a very hard system lockup, I could not get a system reaction on “Ctrl-Alt-SysRq-h” and similar key combinations, any IO to disks stopped completely and immediately, system stopped responding to ping and only a hard reset achieved any reaction out of the system.
After hard reset everything was working, above mentioned file checksum results were unchanged.
I have also tried a send/recieve to a different target pool (a single 1TB HGST disk):
# zfs send -R -vvvv "nas1FS/backup@20160618" |zfs receive -vvvv -F "backup_raidz_test/nas1FS_bak"
resulted with same md5sum mismatches.
When sending only the latest snapshot with:
# zfs send -vvvv "nas1FS/backup@20160618" |zfs receive -vvvv -F "backup_raidz_test/backup"
I get a correct md5sum on the target filesystem.
When trying to do an incremental send receive from the first available snapshot on source pool:
# zfs send -vvvv "nas1FS/backup@20151121_1" |zfs receive -vvvv -F "backup_raidz_test/backup"
Offending file not present on target and source pool and trying to access it on target pool does not cause any issues.
# zfs send -vvvv -I "nas1FS/backup@20151121_1" "nas1FS/backup@20151124" |zfs receive -vvvv -F "backup_raidz_test/backup"
I get again a checksum mismatch.
When trying to do an incremental send receive from the second available snapshot on source pool I get correct checksums on both snapshots on target pool ....
# zfs send -vvvv "nas1FS/backup@20151124" |zfs receive -vvvv -F "backup_raidz_test/backup"
# md5sum /nas1FS/backup/.zfs/snapshot/20151124/samba_share/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
3178c03d3205ac148372a71d75a835ec /nas1FS/backup/.zfs/snapshot/20151124/samba_share/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
# md5sum /backup_raidz_test/backup/.zfs/snapshot/20151124/samba_share/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw2opgrid.pdb
3178c03d3205ac148372a71d75a835ec /backup_raidz_test/backup/.zfs/snapshot/20151124/samba_share/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
# zfs send -vvvv -I "nas1FS/backup@20151124" "nas1FS/backup@20151129" |zfs receive -vvvv -F "backup_raidz_test/backup"
# md5sum /backup_raidz_test/backup/.zfs/snapshot/20151129/samba_share/a/home//wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
3178c03d3205ac148372a71d75a835ec /backup_raidz_test/backup/.zfs/snapshot/20151129/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
#md5sum /nas1FS/backup/.zfs/snapshot/20151129/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
3178c03d3205ac148372a71d75a835ec /nas1FS/backup/.zfs/snapshot/20151129/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb
It is interesting to note that only a single block of 4096 bytes of data is corrupted at the end of the file (that has a size of 321 x 4096 bytes ) and only when transferring data with the first source snapshot ("nas1FS/backup@20151121_1") .
Binary comparison of the offending file:
# vbindiff wxmsw28ud_propgrid.pdb.nas1FS wxmsw28ud_propgrid.pdb.backup_raidz
wxmsw28ud_propgrid.pdb.nas1FS
0013 FFD0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0013 FFE0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0013 FFF0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
wxmsw28ud_propgrid.pdb.backup_raidz
0013 FFD0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0013 FFE0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0013 FFF0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0000: F3 4A 37 0C B1 20 72 48 CF 6F FE BD 3E 76 03 61 .J7.. rH .o..>v.a
0014 0010: 77 91 72 CE 19 83 73 81 D6 95 DF AE 3A 74 3F 9D w.r...s. ....:t?.
0014 0020: 4D DB DD B0 1E 43 2A 2C 63 77 0D B9 E3 EA 8F 1B M....C*, cw......
0014 0030: 75 CF E7 48 C9 85 65 8E 35 0A 41 6C 06 E8 33 9E u..H..e. 5.Al..3.
0014 0040: FD FD 69 01 00 62 41 52 CE A0 7C C1 0E 32 71 FE ..i..bAR ..|..2q.
0014 0050: 3D E9 A0 E4 19 11 80 6D EB BD 1C 31 1B 7F 3E B4 =......m ...1..>.
0014 0060: 09 AB D8 8C E4 2B 10 A4 BA B7 CC 31 B7 39 3C F5 .....+.. ...1.9<.
0014 0070: A5 8B 0C 49 C1 1E A3 DC 7A FE 74 0C 9C F4 19 C9 ...I.... z.t.....
0014 0080: F4 1D 47 34 83 E6 C6 49 07 04 60 70 40 39 CF 3F ..G4...I ..`p@9.?
0014 0090: 8F 7A 00 70 53 16 DD 80 85 C9 DA 64 7F 30 91 DF .z.pS... ...d.0..
0014 00A0: 93 9C F5 3D 69 CE FB 86 57 21 79 C7 F9 3F D6 80 ...=i... W!y..?..
wxmsw28ud_propgrid.pdb.nas1FS
0014 0F30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0F40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0F50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0F60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0F70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0F80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0F90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0FA0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0FB0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0FC0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0FD0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0FE0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 0FF0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ........ ........
0014 1000:
wxmsw28ud_propgrid.pdb.backup_raidz
0014 0F30: B7 60 DB C7 7E 77 FF 00 9F 5A 75 E5 FD E5 94 3E .`..~w.. .Zu....>
0014 0F40: 7C 16 86 F5 60 63 98 77 06 3C F5 CF AD 33 32 2D |...`c.w .<...32-
0014 0F50: 23 50 B7 BC B7 BA 76 84 43 2E 76 84 27 95 24 FC #P....v. C.v.'.$.
0014 0F60: E0 FF 00 F5 EA AE A3 79 AB D9 C2 64 B6 B5 8E 55 .......y ...d...U
0014 0F70: 12 65 27 8F 2C A2 3F 42 0F 7A CA 4E 49 BD 59 51 .e'.,.?B .z.NI.YQ
0014 0F80: F8 91 AB 65 7B 1D E5 BC 57 13 03 0D C2 AE D5 66 ...e{... W......f
0014 0F90: 1E 5E F2 73 BF 8E FF 00 5A C8 D5 AE B5 6B 79 61 .^.s.... Z....kya
0014 0FA0: 78 12 39 2D 55 F6 BF 95 93 27 27 AF FF 00 5E A0 x.9-U... .''...^.
0014 0FB0: D8 DD 56 33 DB 11 E5 4D F6 97 55 29 9F BA 54 F5 ..V3...M ..U)..T.
0014 0FC0: 39 3D 6B 02 FD 75 6B 3D 5A 1B AB 3B A3 34 09 0C 9=k..uk= Z..;.4..
0014 0FD0: 71 8C 42 24 F2 8B 7D F0 C3 A1 FA 9A 00 DE B8 7B q.B$..}. .......{
0014 0FE0: 99 A7 84 E6 39 05 C4 3D 15 02 30 90 7F 52 4D 73 ....9..= ..0..RMs
0014 0FF0: D6 56 3A C5 A6 A5 B8 98 DA 32 CC 6E 15 E6 19 41 .V:..... .2.n...A
0014 1000:
I have also run zdb on the source pool, commands I did check did not find any errors:
# zdb -ccc nas1FS
Traversing all blocks to verify checksums and verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 173 of 174 ...
1.95T completed ( 94MB/s) estimated time remaining: 0hr 00min 00sec
No leaks (block sum matches space maps exactly)
bp count: 16048635
ganged count: 0
bp logical: 1868775708160 avg: 116444
bp physical: 1395945222144 avg: 86982 compression: 1.34
bp allocated: 2147719213056 avg: 133825 compression: 0.87
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 2147719213056 used: 35.92%
additional, non-pointer bps of type 0: 178850
Dittoed blocks on same vdev: 473244
# zdb -mc nas1FS
Metaslabs:
vdev 0
metaslabs 174 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 38 free 138M
metaslab 1 offset 800000000 spacemap 39 free 22.8G
metaslab 2 offset 1000000000 spacemap 52 free 75.2M
metaslab 3 offset 1800000000 spacemap 68 free 30.3G
metaslab 4 offset 2000000000 spacemap 53 free 104M
metaslab 5 offset 2800000000 spacemap 67 free 30.3M
metaslab 6 offset 3000000000 spacemap 69 free 30.3M
metaslab 7 offset 3800000000 spacemap 70 free 30.6M
metaslab 8 offset 4000000000 spacemap 71 free 24.6G
metaslab 9 offset 4800000000 spacemap 72 free 30.6G
metaslab 10 offset 5000000000 spacemap 73 free 31.5G
metaslab 11 offset 5800000000 spacemap 74 free 30.3G
metaslab 12 offset 6000000000 spacemap 77 free 32.0G
metaslab 13 offset 6800000000 spacemap 78 free 32.0G
metaslab 14 offset 7000000000 spacemap 79 free 32.0G
metaslab 15 offset 7800000000 spacemap 80 free 32.0G
metaslab 16 offset 8000000000 spacemap 81 free 32G
metaslab 17 offset 8800000000 spacemap 82 free 32G
metaslab 18 offset 9000000000 spacemap 83 free 32.0G
metaslab 19 offset 9800000000 spacemap 84 free 32.0G
metaslab 20 offset a000000000 spacemap 85 free 32.0G
metaslab 21 offset a800000000 spacemap 86 free 32G
metaslab 22 offset b000000000 spacemap 87 free 32G
metaslab 23 offset b800000000 spacemap 88 free 32.0G
metaslab 24 offset c000000000 spacemap 90 free 32.0G
metaslab 25 offset c800000000 spacemap 91 free 12.1G
metaslab 26 offset d000000000 spacemap 92 free 2.66M
metaslab 27 offset d800000000 spacemap 93 free 4.55M
metaslab 28 offset e000000000 spacemap 94 free 8.17G
metaslab 29 offset e800000000 spacemap 113 free 38.0M
metaslab 30 offset f000000000 spacemap 114 free 5.46G
metaslab 31 offset f800000000 spacemap 115 free 2.07G
metaslab 32 offset 10000000000 spacemap 116 free 2.56G
metaslab 33 offset 10800000000 spacemap 37 free 28.0G
metaslab 34 offset 11000000000 spacemap 117 free 451M
metaslab 35 offset 11800000000 spacemap 118 free 298M
metaslab 36 offset 12000000000 spacemap 119 free 10.8G
metaslab 37 offset 12800000000 spacemap 120 free 9.8M
metaslab 38 offset 13000000000 spacemap 121 free 9.49G
metaslab 39 offset 13800000000 spacemap 131 free 23.9G
metaslab 40 offset 14000000000 spacemap 132 free 19.4G
metaslab 41 offset 14800000000 spacemap 134 free 25.2G
metaslab 42 offset 15000000000 spacemap 135 free 23.1G
metaslab 43 offset 15800000000 spacemap 136 free 23.2G
metaslab 44 offset 16000000000 spacemap 137 free 23.2G
metaslab 45 offset 16800000000 spacemap 76 free 23.2G
metaslab 46 offset 17000000000 spacemap 138 free 18.2G
metaslab 47 offset 17800000000 spacemap 139 free 20.8G
metaslab 48 offset 18000000000 spacemap 140 free 18.1G
metaslab 49 offset 18800000000 spacemap 141 free 22.8G
metaslab 50 offset 19000000000 spacemap 142 free 23.7G
metaslab 51 offset 19800000000 spacemap 143 free 24.5G
metaslab 52 offset 1a000000000 spacemap 144 free 23.2G
metaslab 53 offset 1a800000000 spacemap 156 free 22.8G
metaslab 54 offset 1b000000000 spacemap 158 free 23.7G
metaslab 55 offset 1b800000000 spacemap 159 free 19.3G
metaslab 56 offset 1c000000000 spacemap 160 free 23.0G
metaslab 57 offset 1c800000000 spacemap 89 free 23.4G
metaslab 58 offset 1d000000000 spacemap 161 free 23.2G
metaslab 59 offset 1d800000000 spacemap 162 free 23.2G
metaslab 60 offset 1e000000000 spacemap 174 free 23.4G
metaslab 61 offset 1e800000000 spacemap 175 free 18.6G
metaslab 62 offset 1f000000000 spacemap 176 free 18.5G
metaslab 63 offset 1f800000000 spacemap 177 free 23.0G
metaslab 64 offset 20000000000 spacemap 178 free 23.2G
metaslab 65 offset 20800000000 spacemap 202 free 55.4M
metaslab 66 offset 21000000000 spacemap 36 free 12.2G
metaslab 67 offset 21800000000 spacemap 203 free 2.75G
metaslab 68 offset 22000000000 spacemap 204 free 50.7M
metaslab 69 offset 22800000000 spacemap 206 free 9.7G
metaslab 70 offset 23000000000 spacemap 210 free 4.62M
metaslab 71 offset 23800000000 spacemap 211 free 839M
metaslab 72 offset 24000000000 spacemap 226 free 901M
metaslab 73 offset 24800000000 spacemap 279 free 93.2M
metaslab 74 offset 25000000000 spacemap 106 free 31.9G
metaslab 75 offset 25800000000 spacemap 146 free 32.0G
metaslab 76 offset 26000000000 spacemap 147 free 32.0G
metaslab 77 offset 26800000000 spacemap 148 free 32.0G
metaslab 78 offset 27000000000 spacemap 75 free 31.9G
metaslab 79 offset 27800000000 spacemap 157 free 31.9G
metaslab 80 offset 28000000000 spacemap 149 free 32.0G
metaslab 81 offset 28800000000 spacemap 151 free 32.0G
metaslab 82 offset 29000000000 spacemap 164 free 32G
metaslab 83 offset 29800000000 spacemap 165 free 32.0G
metaslab 84 offset 2a000000000 spacemap 166 free 32.0G
metaslab 85 offset 2a800000000 spacemap 167 free 32.0G
metaslab 86 offset 2b000000000 spacemap 168 free 32.0G
metaslab 87 offset 2b800000000 spacemap 169 free 32.0G
metaslab 88 offset 2c000000000 spacemap 170 free 32.0G
metaslab 89 offset 2c800000000 spacemap 180 free 32.0G
metaslab 90 offset 2d000000000 spacemap 133 free 31.9G
metaslab 91 offset 2d800000000 spacemap 181 free 32.0G
metaslab 92 offset 2e000000000 spacemap 182 free 32.0G
metaslab 93 offset 2e800000000 spacemap 184 free 32.0G
metaslab 94 offset 2f000000000 spacemap 185 free 32.0G
metaslab 95 offset 2f800000000 spacemap 186 free 32.0G
metaslab 96 offset 30000000000 spacemap 191 free 32.0G
metaslab 97 offset 30800000000 spacemap 192 free 32.0G
metaslab 98 offset 31000000000 spacemap 193 free 32.0G
metaslab 99 offset 31800000000 spacemap 194 free 32.0G
metaslab 100 offset 32000000000 spacemap 195 free 32.0G
metaslab 101 offset 32800000000 spacemap 196 free 32.0G
metaslab 102 offset 33000000000 spacemap 205 free 32.0G
metaslab 103 offset 33800000000 spacemap 197 free 32.0G
metaslab 104 offset 34000000000 spacemap 209 free 32.0G
metaslab 105 offset 34800000000 spacemap 245 free 32.0G
metaslab 106 offset 35000000000 spacemap 246 free 32.0G
metaslab 107 offset 35800000000 spacemap 247 free 32.0G
metaslab 108 offset 36000000000 spacemap 248 free 32.0G
metaslab 109 offset 36800000000 spacemap 255 free 32.0G
metaslab 110 offset 37000000000 spacemap 256 free 31.9G
metaslab 111 offset 37800000000 spacemap 257 free 32.0G
metaslab 112 offset 38000000000 spacemap 258 free 32.0G
metaslab 113 offset 38800000000 spacemap 259 free 32.0G
metaslab 114 offset 39000000000 spacemap 150 free 32.0G
metaslab 115 offset 39800000000 spacemap 260 free 32.0G
metaslab 116 offset 3a000000000 spacemap 261 free 31.9G
metaslab 117 offset 3a800000000 spacemap 268 free 30.7G
metaslab 118 offset 3b000000000 spacemap 270 free 31.3G
metaslab 119 offset 3b800000000 spacemap 272 free 31.9G
metaslab 120 offset 3c000000000 spacemap 273 free 31.5G
metaslab 121 offset 3c800000000 spacemap 281 free 30.9G
metaslab 122 offset 3d000000000 spacemap 282 free 31.2G
metaslab 123 offset 3d800000000 spacemap 283 free 31.0G
metaslab 124 offset 3e000000000 spacemap 284 free 31.2G
metaslab 125 offset 3e800000000 spacemap 285 free 31.1G
metaslab 126 offset 3f000000000 spacemap 183 free 31.2G
metaslab 127 offset 3f800000000 spacemap 299 free 30.9G
metaslab 128 offset 40000000000 spacemap 300 free 31.0G
metaslab 129 offset 40800000000 spacemap 301 free 31.0G
metaslab 130 offset 41000000000 spacemap 310 free 30.9G
metaslab 131 offset 41800000000 spacemap 365 free 30.9G
metaslab 132 offset 42000000000 spacemap 399 free 29.8G
metaslab 133 offset 42800000000 spacemap 400 free 30.7G
metaslab 134 offset 43000000000 spacemap 434 free 31.0G
metaslab 135 offset 43800000000 spacemap 435 free 30.9G
metaslab 136 offset 44000000000 spacemap 436 free 30.9G
metaslab 137 offset 44800000000 spacemap 437 free 31.8G
metaslab 138 offset 45000000000 spacemap 244 free 32.0G
metaslab 139 offset 45800000000 spacemap 438 free 32G
metaslab 140 offset 46000000000 spacemap 439 free 32.0G
metaslab 141 offset 46800000000 spacemap 448 free 32.0G
metaslab 142 offset 47000000000 spacemap 530 free 12.9G
metaslab 143 offset 47800000000 spacemap 628 free 41.7M
metaslab 144 offset 48000000000 spacemap 629 free 60.0M
metaslab 145 offset 48800000000 spacemap 630 free 59.7M
metaslab 146 offset 49000000000 spacemap 631 free 41.5M
metaslab 147 offset 49800000000 spacemap 632 free 63.8M
metaslab 148 offset 4a000000000 spacemap 633 free 13.1G
metaslab 149 offset 4a800000000 spacemap 634 free 238M
metaslab 150 offset 4b000000000 spacemap 267 free 18.9G
metaslab 151 offset 4b800000000 spacemap 269 free 18.1G
metaslab 152 offset 4c000000000 spacemap 271 free 1.15G
metaslab 153 offset 4c800000000 spacemap 635 free 6.17G
metaslab 154 offset 4d000000000 spacemap 636 free 10.8G
metaslab 155 offset 4d800000000 spacemap 637 free 5.34G
metaslab 156 offset 4e000000000 spacemap 638 free 54.3M
metaslab 157 offset 4e800000000 spacemap 639 free 3.82G
metaslab 158 offset 4f000000000 spacemap 332 free 22.3M
metaslab 159 offset 4f800000000 spacemap 343 free 76.9M
metaslab 160 offset 50000000000 spacemap 344 free 7.76G
metaslab 161 offset 50800000000 spacemap 345 free 7.33G
metaslab 162 offset 51000000000 spacemap 346 free 7.86G
metaslab 163 offset 51800000000 spacemap 347 free 110M
metaslab 164 offset 52000000000 spacemap 364 free 2.27G
metaslab 165 offset 52800000000 spacemap 101 free 127M
metaslab 166 offset 53000000000 spacemap 51 free 176M
metaslab 167 offset 53800000000 spacemap 198 free 64.2M
metaslab 168 offset 54000000000 spacemap 200 free 9.7G
metaslab 169 offset 54800000000 spacemap 240 free 9.27G
metaslab 170 offset 55000000000 spacemap 251 free 41.7M
metaslab 171 offset 55800000000 spacemap 249 free 1.94G
metaslab 172 offset 56000000000 spacemap 278 free 5.11M
metaslab 173 offset 56800000000 spacemap 291 free 4.97G
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
No leaks (block sum matches space maps exactly)
bp count: 16048574
ganged count: 0
bp logical: 1868774123008 avg: 116444
bp physical: 1395943846400 avg: 86982 compression: 1.34
bp allocated: 2147716362240 avg: 133825 compression: 0.87
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 2147716362240 used: 35.92%
additional, non-pointer bps of type 0: 178850
Dittoed blocks on same vdev: 473223
To summarize used system configurations:
System used for “nas1FS” data fillup (old computer):
Motherboard MSI X58 Pro (MSI MS-7522/MSI X58 Gold) with Intel Quad Core i7 965 3.2GHz and 14GB non-ECC RAM (MB, CPU and RAM and PS are about 6 years old).
“/boot” : INTEL SSDMCEAW080A4 80GB SSD
“nas1FS” pool: 6x1TB HGST Travelstar 7K1000 in a RAIDZ2 array (HDDs are ~6months old).
New system to which “nas1FS” was moved (all disks):
Motherboard Supermicro A1SA7-2750F (8 core Intel Atom) with 32GB ECC RAM (MB and RAM and PS are new).
“/boot” : INTEL SSDMCEAW080A4 80GB SSD
“nas1FS” pool: 6x1TB HGST Travelstar 7K1000 in a RAIDZ2 array (moved from the old computer).
“backup_raidz” pool: 2x1TB Samsung HD103UJ + 1x1TB HGST Travelstar 7K1000 (pool used for backup)
“backup_raidz_test” pool: 1TB Samsung HD103UJ (pool with no parity, for additional tests)
Both systems were tested with memtest, cpuburn etc. without errors.
I am using Debian Jessie booted from zfs pool (with a separate boot partition), same operating system on both machines used with “nas1FS” pool.
Kernel Command line:
BOOT_IMAGE=/vmlinuz-3.16.0-4-amd64 root=ZFS=/rootFS/1_jessie ro rpool=nas1FS bootfs=nas1FS/rootFS/1_jessie root=ZFS=nas1FS/rootFS/1_jessie rootfstype=zfs boot=zfs quiet
# uname -a
Linux nas1.ip 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) x86_64 GNU/Linux
SPL and ZFS was built from source.
# dpkg -l |grep zfs
ii kmod-zfs-3.16.0-4-amd64 0.6.5.3-1 amd64 zfs kernel module(s) for 3.16.0-4-amd64
ii kmod-zfs-devel 0.6.5.3-1 amd64 zfs kernel module(s) devel common
ii kmod-zfs-devel-3.16.0-4-amd64 0.6.5.3-1 amd64 zfs kernel module(s) devel for 3.16.0-4-amd64
ii libzfs2 0.6.5.3-1 amd64 Native ZFS filesystem library for Linux
ii libzfs2-devel 0.6.5.3-1 amd64 Development headers
ii zfs 0.6.5.3-1 amd64 Commands to control the kernel modules and libraries
ii zfs-dracut 0.6.5.3-1 amd64 Dracut module
ii zfs-initramfs 0.6.5.3-1 amd64 Initramfs module
ii zfs-test 0.6.5.3-1 amd64 Test infrastructure
# cat /etc/modprobe.d/zfs-arc-max.conf
options zfs zfs_arc_max=4294967296
# zfs get all nas1FS/backup
NAME PROPERTY VALUE SOURCE
nas1FS/backup type filesystem -
nas1FS/backup creation Mon Nov 9 1:50 2015 -
nas1FS/backup used 1.02T -
nas1FS/backup available 2.21T -
nas1FS/backup referenced 866G -
nas1FS/backup compressratio 1.39x -
nas1FS/backup mounted yes -
nas1FS/backup quota none default
nas1FS/backup reservation none default
nas1FS/backup recordsize 128K default
nas1FS/backup mountpoint /nas1FS/backup inherited from nas1FS
nas1FS/backup sharenfs off default
nas1FS/backup checksum on default
nas1FS/backup compression gzip-9 local
nas1FS/backup atime off inherited from nas1FS
nas1FS/backup devices on default
nas1FS/backup exec on default
nas1FS/backup setuid on default
nas1FS/backup readonly off default
nas1FS/backup zoned off default
nas1FS/backup snapdir hidden default
nas1FS/backup aclinherit restricted default
nas1FS/backup canmount on default
nas1FS/backup xattr sa local
nas1FS/backup copies 1 default
nas1FS/backup version 5 -
nas1FS/backup utf8only off -
nas1FS/backup normalization none -
nas1FS/backup casesensitivity sensitive -
nas1FS/backup vscan off default
nas1FS/backup nbmand off default
nas1FS/backup sharesmb off default
nas1FS/backup refquota none default
nas1FS/backup refreservation none default
nas1FS/backup primarycache all default
nas1FS/backup secondarycache all default
nas1FS/backup usedbysnapshots 63.4G -
nas1FS/backup usedbydataset 866G -
nas1FS/backup usedbychildren 113G -
nas1FS/backup usedbyrefreservation 0 -
nas1FS/backup logbias latency default
nas1FS/backup dedup off default
nas1FS/backup mlslabel none default
nas1FS/backup sync standard default
nas1FS/backup refcompressratio 1.45x -
nas1FS/backup written 16.0K -
nas1FS/backup logicalused 1.39T -
nas1FS/backup logicalreferenced 1.19T -
nas1FS/backup filesystem_limit none default
nas1FS/backup snapshot_limit none default
nas1FS/backup filesystem_count none default
nas1FS/backup snapshot_count none default
nas1FS/backup snapdev hidden default
nas1FS/backup acltype off default
nas1FS/backup context none default
nas1FS/backup fscontext none default
nas1FS/backup defcontext none default
nas1FS/backup rootcontext none default
nas1FS/backup relatime off default
nas1FS/backup redundant_metadata all default
nas1FS/backup overlay off default
# zfs get all nas1FS
NAME PROPERTY VALUE SOURCE
nas1FS type filesystem -
nas1FS creation Sun Nov 8 18:14 2015 -
nas1FS used 1.30T -
nas1FS available 2.21T -
nas1FS referenced 298M -
nas1FS compressratio 1.33x -
nas1FS mounted yes -
nas1FS quota none default
nas1FS reservation none default
nas1FS recordsize 128K default
nas1FS mountpoint /nas1FS local
nas1FS sharenfs off default
nas1FS checksum on default
nas1FS compression off default
nas1FS atime off local
nas1FS devices on default
nas1FS exec on default
nas1FS setuid on default
nas1FS readonly off default
nas1FS zoned off default
nas1FS snapdir hidden default
nas1FS aclinherit restricted default
nas1FS canmount on default
nas1FS xattr on default
nas1FS copies 1 default
nas1FS version 5 -
nas1FS utf8only off -
nas1FS normalization none -
nas1FS casesensitivity sensitive -
nas1FS vscan off default
nas1FS nbmand off default
nas1FS sharesmb off default
nas1FS refquota none default
nas1FS refreservation none default
nas1FS primarycache all default
nas1FS secondarycache all default
nas1FS usedbysnapshots 144K -
nas1FS usedbydataset 298M -
nas1FS usedbychildren 1.30T -
nas1FS usedbyrefreservation 0 -
nas1FS logbias latency default
nas1FS dedup off default
nas1FS mlslabel none default
nas1FS sync standard default
nas1FS refcompressratio 1.00x -
nas1FS written 298M -
nas1FS logicalused 1.70T -
nas1FS logicalreferenced 298M -
nas1FS filesystem_limit none default
nas1FS snapshot_limit none default
nas1FS filesystem_count none default
nas1FS snapshot_count none default
nas1FS snapdev hidden default
nas1FS acltype off default
nas1FS context none default
nas1FS fscontext none default
nas1FS defcontext none default
nas1FS rootcontext none default
nas1FS relatime off default
nas1FS redundant_metadata all default
nas1FS overlay off default
# zpool get all nas1FS
NAME PROPERTY VALUE SOURCE
nas1FS size 5.44T -
nas1FS capacity 35% -
nas1FS altroot - default
nas1FS health ONLINE -
nas1FS guid 12764045164742778073 default
nas1FS version - default
nas1FS bootfs nas1FS/rootFS/1_jessie local
nas1FS delegation on default
nas1FS autoreplace off default
nas1FS cachefile - default
nas1FS failmode wait default
nas1FS listsnapshots off default
nas1FS autoexpand off default
nas1FS dedupditto 0 default
nas1FS dedupratio 1.00x -
nas1FS free 3.48T -
nas1FS allocated 1.95T -
nas1FS readonly off -
nas1FS ashift 12 local
nas1FS comment - default
nas1FS expandsize - -
nas1FS freeing 0 default
nas1FS fragmentation 12% -
nas1FS leaked 0 default
nas1FS feature@async_destroy enabled local
nas1FS feature@empty_bpobj active local
nas1FS feature@lz4_compress active local
nas1FS feature@spacemap_histogram active local
nas1FS feature@enabled_txg active local
nas1FS feature@hole_birth active local
nas1FS feature@extensible_dataset enabled local
nas1FS feature@embedded_data active local
nas1FS feature@bookmarks enabled local
nas1FS feature@filesystem_limits enabled local
nas1FS feature@large_blocks enabled local
Some excerpts from dmesg (blocked messages are not connected to the hard lockup of the system):
[14051.194118] INFO: task txg_sync:6287 blocked for more than 120 seconds.
[14051.194125] Tainted: P O 3.16.0-4-amd64 #1
[14051.194127] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[14051.194130] txg_sync D ffff880785cb2ef8 0 6287 2 0x00000000
[14051.194136] ffff880785cb2aa0 0000000000000046 0000000000012f00 ffff880782d1ffd8
[14051.194140] 0000000000012f00 ffff880785cb2aa0 ffff88087fc537b0 ffff8807a1088850
[14051.194143] ffff8807a1088890 0000000000000001 ffff88085afed000 0000000000000000
[14051.194147] Call Trace:
[14051.194156] [<ffffffff8150e019>] ? io_schedule+0x99/0x120
[14051.194175] [<ffffffffa0202690>] ? cv_wait_common+0x90/0x100 [spl]
[14051.194189] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[14051.194210] [<ffffffffa032d73b>] ? zio_wait+0x10b/0x1e0 [zfs]
[14051.194229] [<ffffffffa02bb75a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
[14051.194249] [<ffffffffa02d5646>] ? spa_sync+0x366/0xb30 [zfs]
[14051.194278] [<ffffffffa02e70c1>] ? txg_sync_thread+0x3d1/0x680 [zfs]
[14051.194298] [<ffffffffa02e6cf0>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
[14051.194305] [<ffffffffa01fdd3b>] ? thread_generic_wrapper+0x6b/0x80 [spl]
[14051.194311] [<ffffffffa01fdcd0>] ? __thread_exit+0x20/0x20 [spl]
[14051.194316] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[14051.194320] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[14051.194323] [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90
[14051.194334] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[172938.829534] INFO: task txg_sync:13384 blocked for more than 120 seconds.
[172938.829549] Tainted: P O 3.16.0-4-amd64 #1
[172938.829551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[172938.829554] txg_sync D ffff8807876a2eb8 0 13384 2 0x00000000
[172938.829559] ffff8807876a2a60 0000000000000046 0000000000012f00 ffff880622707fd8
[172938.829563] 0000000000012f00 ffff8807876a2a60 ffff88087fd537b0 ffff88065adb1110
[172938.829567] ffff88065adb1150 0000000000000001 ffff88085afed000 0000000000000000
[172938.829570] Call Trace:
[172938.829579] [<ffffffff8150e019>] ? io_schedule+0x99/0x120
[172938.829599] [<ffffffffa0202690>] ? cv_wait_common+0x90/0x100 [spl]
[172938.829604] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[172938.829625] [<ffffffffa032d73b>] ? zio_wait+0x10b/0x1e0 [zfs]
[172938.829652] [<ffffffffa02bb75a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
[172938.829673] [<ffffffffa02d5646>] ? spa_sync+0x366/0xb30 [zfs]
[172938.829693] [<ffffffffa02e70c1>] ? txg_sync_thread+0x3d1/0x680 [zfs]
[172938.829720] [<ffffffffa02e6cf0>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
[172938.829727] [<ffffffffa01fdd3b>] ? thread_generic_wrapper+0x6b/0x80 [spl]
[172938.829733] [<ffffffffa01fdcd0>] ? __thread_exit+0x20/0x20 [spl]
[172938.829737] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[172938.829741] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[172938.829745] [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90
[172938.829749] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[173539.308306] INFO: task txg_sync:13384 blocked for more than 120 seconds.
[173539.308312] Tainted: P O 3.16.0-4-amd64 #1
[173539.308314] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[173539.308317] txg_sync D ffff8807876a2eb8 0 13384 2 0x00000000
[173539.308322] ffff8807876a2a60 0000000000000046 0000000000012f00 ffff880622707fd8
[173539.308326] 0000000000012f00 ffff8807876a2a60 ffff88087fc537b0 ffff8807fd2523b0
[173539.308329] ffff8807fd2523f0 0000000000000001 ffff88085afed000 0000000000000000
[173539.308333] Call Trace:
[173539.308341] [<ffffffff8150e019>] ? io_schedule+0x99/0x120
[173539.308367] [<ffffffffa0202690>] ? cv_wait_common+0x90/0x100 [spl]
[173539.308372] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[173539.308392] [<ffffffffa032d73b>] ? zio_wait+0x10b/0x1e0 [zfs]
[173539.308411] [<ffffffffa02bb75a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
[173539.308431] [<ffffffffa02d5646>] ? spa_sync+0x366/0xb30 [zfs]
[173539.308450] [<ffffffffa02e70c1>] ? txg_sync_thread+0x3d1/0x680 [zfs]
[173539.308470] [<ffffffffa02e6cf0>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
[173539.308484] [<ffffffffa01fdd3b>] ? thread_generic_wrapper+0x6b/0x80 [spl]
[173539.308490] [<ffffffffa01fdcd0>] ? __thread_exit+0x20/0x20 [spl]
[173539.308494] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[173539.308498] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[173539.308502] [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90
[173539.308506] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[173539.308512] INFO: task zfs:28953 blocked for more than 120 seconds.
[173539.308515] Tainted: P O 3.16.0-4-amd64 #1
[173539.308516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[173539.308518] zfs D ffff8807fd460ef8 0 28953 14132 0x00000000
[173539.308522] ffff8807fd460aa0 0000000000000086 0000000000012f00 ffff88068360bfd8
[173539.308525] 0000000000012f00 ffff8807fd460aa0 ffff88085afed368 ffff88085afed220
[173539.308529] ffff88085afed370 0000000000000000 ffff88085afed320 ffff88064b160000
[173539.308532] Call Trace:
[173539.308539] [<ffffffffa02026cd>] ? cv_wait_common+0xcd/0x100 [spl]
[173539.308543] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[173539.308562] [<ffffffffa02e61eb>] ? txg_wait_open+0xbb/0x100 [zfs]
[173539.308585] [<ffffffffa02a3280>] ? dmu_tx_wait+0x370/0x380 [zfs]
[173539.308601] [<ffffffffa02a332b>] ? dmu_tx_assign+0x9b/0x510 [zfs]
[173539.308617] [<ffffffffa029ee19>] ? restore_write+0xd9/0x1c0 [zfs]
[173539.308633] [<ffffffffa02a0b39>] ? dmu_recv_stream+0x429/0xb00 [zfs]
[173539.308640] [<ffffffffa021d4e4>] ? nvlist_common.part.102+0xe4/0x200 [znvpair]
[173539.308659] [<ffffffffa030e225>] ? zfs_ioc_recv+0x1f5/0xb10 [zfs]
[173539.308671] [<ffffffffa00a817d>] ? avl_find+0x5d/0xa0 [zavl]
[173539.308677] [<ffffffffa020ea9e>] ? zprop_name_to_prop_cb+0x5e/0x70 [zcommon]
[173539.308696] [<ffffffffa02bc90a>] ? dsl_prop_get_dd+0x18a/0x240 [zfs]
[173539.308703] [<ffffffffa02033df>] ? tsd_hash_dtor+0x6f/0x80 [spl]
[173539.308709] [<ffffffffa0203ab5>] ? tsd_set+0x165/0x4d0 [spl]
[173539.308727] [<ffffffffa02bcb3c>] ? dsl_prop_get_ds+0x17c/0x230 [zfs]
[173539.308782] [<ffffffffa02cc5d8>] ? rrw_exit+0x58/0x160 [zfs]
[173539.308812] [<ffffffffa030d019>] ? zfsdev_ioctl+0x489/0x4c0 [zfs]
[173539.308818] [<ffffffff811ba4df>] ? do_vfs_ioctl+0x2cf/0x4b0
[173539.308821] [<ffffffff8150d8c1>] ? __schedule+0x2b1/0x710
[173539.308842] [<ffffffff811ba741>] ? SyS_ioctl+0x81/0xa0
[173539.308857] [<ffffffff8151158d>] ? system_call_fast_compare_end+0x10/0x15
[187471.366228] INFO: task txg_sync:13384 blocked for more than 120 seconds.
[187471.366235] Tainted: P O 3.16.0-4-amd64 #1
[187471.366237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[187471.366239] txg_sync D ffff8807876a2eb8 0 13384 2 0x00000000
[187471.366245] ffff8807876a2a60 0000000000000046 0000000000012f00 ffff880622707fd8
[187471.366249] 0000000000012f00 ffff8807876a2a60 ffff88087fcd37b0 ffff88071be29d70
[187471.366252] ffff88071be29db0 0000000000000001 ffff88085afed000 0000000000000000
[187471.366256] Call Trace:
[187471.366264] [<ffffffff8150e019>] ? io_schedule+0x99/0x120
[187471.366291] [<ffffffffa0202690>] ? cv_wait_common+0x90/0x100 [spl]
[187471.366296] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[187471.366317] [<ffffffffa032d73b>] ? zio_wait+0x10b/0x1e0 [zfs]
[187471.366336] [<ffffffffa02bb75a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
[187471.366356] [<ffffffffa02d5646>] ? spa_sync+0x366/0xb30 [zfs]
[187471.366383] [<ffffffffa02e70c1>] ? txg_sync_thread+0x3d1/0x680 [zfs]
[187471.366403] [<ffffffffa02e6cf0>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
[187471.366409] [<ffffffffa01fdd3b>] ? thread_generic_wrapper+0x6b/0x80 [spl]
[187471.366415] [<ffffffffa01fdcd0>] ? __thread_exit+0x20/0x20 [spl]
[187471.366420] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[187471.366424] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[187471.366428] [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90
[187471.366431] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[201523.040835] INFO: task txg_sync:13384 blocked for more than 120 seconds.
[201523.040842] Tainted: P O 3.16.0-4-amd64 #1
[201523.040844] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[201523.040846] txg_sync D ffff8807876a2eb8 0 13384 2 0x00000000
[201523.040852] ffff8807876a2a60 0000000000000046 0000000000012f00 ffff880622707fd8
[201523.040855] 0000000000012f00 ffff8807876a2a60 ffff88087fc137b0 ffff880727ac8330
[201523.040859] ffff880727ac8370 0000000000000001 ffff88085afed000 0000000000000000
[201523.040862] Call Trace:
[201523.040871] [<ffffffff8150e019>] ? io_schedule+0x99/0x120
[201523.040889] [<ffffffffa0202690>] ? cv_wait_common+0x90/0x100 [spl]
[201523.040894] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[201523.040915] [<ffffffffa032d73b>] ? zio_wait+0x10b/0x1e0 [zfs]
[201523.040942] [<ffffffffa02bb75a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
[201523.040962] [<ffffffffa02d5646>] ? spa_sync+0x366/0xb30 [zfs]
[201523.040982] [<ffffffffa02e70c1>] ? txg_sync_thread+0x3d1/0x680 [zfs]
[201523.041002] [<ffffffffa02e6cf0>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
[201523.041016] [<ffffffffa01fdd3b>] ? thread_generic_wrapper+0x6b/0x80 [spl]
[201523.041022] [<ffffffffa01fdcd0>] ? __thread_exit+0x20/0x20 [spl]
[201523.041026] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[201523.041030] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[201523.041034] [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90
[201523.041037] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[219777.534318] INFO: task txg_sync:13384 blocked for more than 120 seconds.
[219777.534325] Tainted: P O 3.16.0-4-amd64 #1
[219777.534326] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[219777.534329] txg_sync D ffff8807876a2eb8 0 13384 2 0x00000000
[219777.534334] ffff8807876a2a60 0000000000000046 0000000000012f00 ffff880622707fd8
[219777.534338] 0000000000012f00 ffff8807876a2a60 ffff88087fcd37b0 ffff8808557c3950
[219777.534341] ffff8808557c3990 0000000000000001 ffff88085afed000 0000000000000000
[219777.534345] Call Trace:
[219777.534353] [<ffffffff8150e019>] ? io_schedule+0x99/0x120
[219777.534372] [<ffffffffa0202690>] ? cv_wait_common+0x90/0x100 [spl]
[219777.534385] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[219777.534405] [<ffffffffa032d73b>] ? zio_wait+0x10b/0x1e0 [zfs]
[219777.534425] [<ffffffffa02bb75a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
[219777.534445] [<ffffffffa02d5646>] ? spa_sync+0x366/0xb30 [zfs]
[219777.534465] [<ffffffffa02e70c1>] ? txg_sync_thread+0x3d1/0x680 [zfs]
[219777.534491] [<ffffffffa02e6cf0>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
[219777.534498] [<ffffffffa01fdd3b>] ? thread_generic_wrapper+0x6b/0x80 [spl]
[219777.534504] [<ffffffffa01fdcd0>] ? __thread_exit+0x20/0x20 [spl]
[219777.534509] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[219777.534512] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[219777.534516] [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90
[219777.534520] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[220257.909086] INFO: task txg_sync:13384 blocked for more than 120 seconds.
[220257.909092] Tainted: P O 3.16.0-4-amd64 #1
[220257.909094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[220257.909097] txg_sync D ffff8807876a2eb8 0 13384 2 0x00000000
[220257.909109] ffff8807876a2a60 0000000000000046 0000000000012f00 ffff880622707fd8
[220257.909113] 0000000000012f00 ffff8807876a2a60 ffff88087fdd37b0 ffff8807c2be5050
[220257.909117] ffff8807c2be5090 0000000000000001 ffff88085afed000 0000000000000000
[220257.909120] Call Trace:
[220257.909129] [<ffffffff8150e019>] ? io_schedule+0x99/0x120
[220257.909149] [<ffffffffa0202690>] ? cv_wait_common+0x90/0x100 [spl]
[220257.909154] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[220257.909175] [<ffffffffa032d73b>] ? zio_wait+0x10b/0x1e0 [zfs]
[220257.909202] [<ffffffffa02bb75a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
[220257.909222] [<ffffffffa02d5646>] ? spa_sync+0x366/0xb30 [zfs]
[220257.909242] [<ffffffffa02e70c1>] ? txg_sync_thread+0x3d1/0x680 [zfs]
[220257.909261] [<ffffffffa02e6cf0>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
[220257.909274] [<ffffffffa01fdd3b>] ? thread_generic_wrapper+0x6b/0x80 [spl]
[220257.909280] [<ffffffffa01fdcd0>] ? __thread_exit+0x20/0x20 [spl]
[220257.909285] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[220257.909289] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[220257.909292] [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90
[220257.909296] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[220257.909303] INFO: task zfs:597 blocked for more than 120 seconds.
[220257.909306] Tainted: P O 3.16.0-4-amd64 #1
[220257.909307] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[220257.909309] zfs D ffff88077f5cd8c8 0 597 584 0x00000000
[220257.909313] ffff88077f5cd470 0000000000000082 0000000000012f00 ffff880765093fd8
[220257.909316] 0000000000012f00 ffff88077f5cd470 ffff88085afed368 ffff88085afed220
[220257.909319] ffff88085afed370 0000000000000000 ffff88085afed320 ffff88064b160000
[220257.909323] Call Trace:
[220257.909330] [<ffffffffa02026cd>] ? cv_wait_common+0xcd/0x100 [spl]
[220257.909341] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[220257.909360] [<ffffffffa02e61eb>] ? txg_wait_open+0xbb/0x100 [zfs]
[220257.909377] [<ffffffffa02a3280>] ? dmu_tx_wait+0x370/0x380 [zfs]
[220257.909393] [<ffffffffa02a332b>] ? dmu_tx_assign+0x9b/0x510 [zfs]
[220257.909409] [<ffffffffa029ee19>] ? restore_write+0xd9/0x1c0 [zfs]
[220257.909432] [<ffffffffa02a0b39>] ? dmu_recv_stream+0x429/0xb00 [zfs]
[220257.909439] [<ffffffffa021d4e4>] ? nvlist_common.part.102+0xe4/0x200 [znvpair]
[220257.909458] [<ffffffffa030e225>] ? zfs_ioc_recv+0x1f5/0xb10 [zfs]
[220257.909464] [<ffffffffa00a817d>] ? avl_find+0x5d/0xa0 [zavl]
[220257.909470] [<ffffffffa020ea9e>] ? zprop_name_to_prop_cb+0x5e/0x70 [zcommon]
[220257.909489] [<ffffffffa02bc90a>] ? dsl_prop_get_dd+0x18a/0x240 [zfs]
[220257.909502] [<ffffffffa02033df>] ? tsd_hash_dtor+0x6f/0x80 [spl]
[220257.909509] [<ffffffffa0203ab5>] ? tsd_set+0x165/0x4d0 [spl]
[220257.909527] [<ffffffffa02bcb3c>] ? dsl_prop_get_ds+0x17c/0x230 [zfs]
[220257.909548] [<ffffffffa02cc5d8>] ? rrw_exit+0x58/0x160 [zfs]
[220257.909565] [<ffffffffa030d019>] ? zfsdev_ioctl+0x489/0x4c0 [zfs]
[220257.909576] [<ffffffff811ba4df>] ? do_vfs_ioctl+0x2cf/0x4b0
[220257.909579] [<ffffffff8150d8c1>] ? __schedule+0x2b1/0x710
[220257.909583] [<ffffffff811ba741>] ? SyS_ioctl+0x81/0xa0
[220257.909587] [<ffffffff8151158d>] ? system_call_fast_compare_end+0x10/0x15
[221218.788723] INFO: task txg_sync:13384 blocked for more than 120 seconds.
[221218.788730] Tainted: P O 3.16.0-4-amd64 #1
[221218.788732] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[221218.788734] txg_sync D ffff8807876a2eb8 0 13384 2 0x00000000
[221218.788747] ffff8807876a2a60 0000000000000046 0000000000012f00 ffff880622707fd8
[221218.788750] 0000000000012f00 ffff8807876a2a60 ffff88087fc937b0 ffff8807f5abe3f0
[221218.788754] ffff8807f5abe430 0000000000000001 ffff88085afed000 0000000000000000
[221218.788757] Call Trace:
[221218.788766] [<ffffffff8150e019>] ? io_schedule+0x99/0x120
[221218.788784] [<ffffffffa0202690>] ? cv_wait_common+0x90/0x100 [spl]
[221218.788789] [<ffffffff810a7a40>] ? prepare_to_wait_event+0xf0/0xf0
[221218.788810] [<ffffffffa032d73b>] ? zio_wait+0x10b/0x1e0 [zfs]
[221218.788829] [<ffffffffa02bb75a>] ? dsl_pool_sync+0xaa/0x460 [zfs]
[221218.788856] [<ffffffffa02d5646>] ? spa_sync+0x366/0xb30 [zfs]
[221218.788877] [<ffffffffa02e70c1>] ? txg_sync_thread+0x3d1/0x680 [zfs]
[221218.788896] [<ffffffffa02e6cf0>] ? txg_quiesce_thread+0x3e0/0x3e0 [zfs]
[221218.788903] [<ffffffffa01fdd3b>] ? thread_generic_wrapper+0x6b/0x80 [spl]
[221218.788909] [<ffffffffa01fdcd0>] ? __thread_exit+0x20/0x20 [spl]
[221218.788913] [<ffffffff81087f7d>] ? kthread+0xbd/0xe0
[221218.788917] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
[221218.788921] [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90
[221218.788931] [<ffffffff81087ec0>] ? kthread_create_on_node+0x180/0x180
I hope this information will be helpful, but feel free to let me know what other tests I can perform to diagnose this issue. I will be happy to provide any other info.
Could this be the same bug as https://github.com/zfsonlinux/zfs/issues/4050? IIRC 6.5.3 was affected and you seem to have the hole_birth feature enabled, plus the problem seems to manifest in the same way: chunks of data at the end of the file that should be zeros filled instead with other data.
Anyway all these "_ZFS send/recv corrupts data_" bug reports are really scary, considering the fact it's a ZFS feature that is primarily used for replica/backups.
Thanks for the suggestion @loli10K, it would not occur to me to check that issue.
I have checked output of commands suggested by @bprotopopov on Dec 3, 2015
and it seems that the system is affected by bug #4050.
# zfs create -o recordsize=4k nas1FS/test_fs
# zfs set compression=on nas1FS/test_fs
# truncate -s 1G /nas1FS/test_fs/large_file
# ls /nas1FS/test_fs/large_file -als
1 -rw-r--r-- 1 root root 1073741824 Jun 28 16:25 /nas1FS/test_fs/large_file
# dd if=/dev/urandom of=/nas1FS/test_fs/large_file bs=4k count=$((3*128)) seek=$((1*128))
384+0 records in
384+0 records out
1572864 bytes (1.6 MB) copied, 0.588248 s, 2.7 MB/s
# zfs snapshot nas1FS/test_fs@20160628_1
# truncate -s $((2*128*4*1024)) /nas1FS/test_fs/large_file
# dd if=/dev/urandom of=/nas1FS/test_fs/large_file bs=4k count=128 seek=$((3*128)) conv=notrunc
128+0 records in
128+0 records out
524288 bytes (524 kB) copied, 0.224695 s, 2.3 MB/s
# dd if=/dev/urandom of=/nas1FS/test_fs/large_file bs=4k count=10 seek=$((2*128)) conv=notrunc
10+0 records in
10+0 records out
40960 bytes (41 kB) copied, 0.0144285 s, 2.8 MB/s
# zfs snapshot nas1FS/test_fs@20160628_2
# zfs send nas1FS/test_fs@20160628_1 |zfs recv backup_raidz_test/test_fs_copy
# zfs send -i nas1FS/test_fs@20160628_1 nas1FS/test_fs@20160628_2 |zfs recv backup_raidz_test/test_fs_copy
# ls -als /nas1FS/test_fs/large_file /backup_raidz_test/test_fs_copy/large_file
1593 -rw-r--r-- 1 root root 2097152 Jun 28 16:33 /backup_raidz_test/test_fs_copy/large_file
2223 -rw-r--r-- 1 root root 2097152 Jun 28 16:33 /nas1FS/test_fs/large_file
# md5sum /nas1FS/test_fs/large_file /backup_raidz_test/test_fs_copy/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/large_file
7a8008b4cc398a0a4ad8a6c686aa0f6f /backup_raidz_test/test_fs_copy/large_file
# md5sum /nas1FS/test_fs/.zfs/snapshot/20160628_1/large_file /nas1FS/test_fs/.zfs/snapshot/20160628_2/large_file /nas1FS/test_fs/large_file /backup_raidz_test/test_fs_copy/.zfs/snapshot/20160628_1/large_file /backup_raidz_test/test_fs_copy/.zfs/snapshot/20160628_2/large_file /backup_raidz_test/test_fs_copy/large_file
0f41e9d5956532d434572b6f06d7b082 /nas1FS/test_fs/.zfs/snapshot/20160628_1/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160628_2/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/large_file
0f41e9d5956532d434572b6f06d7b082 /backup_raidz_test/test_fs_copy/.zfs/snapshot/20160628_1/large_file
7a8008b4cc398a0a4ad8a6c686aa0f6f /backup_raidz_test/test_fs_copy/.zfs/snapshot/20160628_2/large_file
7a8008b4cc398a0a4ad8a6c686aa0f6f /backup_raidz_test/test_fs_copy/large_file
I am not sure, but to trigger this same issue on the "nas1FS" don`t I need to modify the file?
If I remember correctly the file was never modified, and did not exist in "nas1FS/backup@20151121_1" snapshot and was just added to fs just before snapshot "nas1FS/backup@20151124".
Do you think that upgrading to v0.6.5.7 of spl/zfs would resolve this, and have no negative impact on the source pool?
I have no objections to a complete upgrade to 0.6.5.7.
Unfortunately I just do not have enough knowledge about zfs internals.
I was just curious if upgrading can make making a diagnose harder or harm the source pool in some way?
I was able to crash the system when doing an md5sum of the problematic file on a snapshot when it did not exist on source pool ("nas1FS/backup@20151121_1"), so I thought that there can also be some kind of problem with the source pool.
Would any more testing be useful before upgrade (I will not be able to make a backup copy of the disks with the issue before the upgrade)?
You should probably try to replicate the lock/crash on 0.6.5.7 and/or master and then report that as a distinct problem, if it persists.
Presuming this is bug #4050, the bug only manifests in the stream when you transmit via incremental send, so the source pool's data is not affected, and upgrading your ZoL version should be safe.
@loli10K thanks for pointing out #4050. And I completely agree, these kinds of issues can't be allowed to happen and we should work on increasing our test coverage in order to prevent them.
@pwolny I'd also recommend upgrading to 0.6.5.7 and verifying that you can no longer reproduce the issue.
Thanks, I will upgrade to 0.6.5.7 and test it tonight or tomorrow.
I have upgraded to v0.6.5.7 and problem starts to be more interesting.
The crash during the access to not existing file in snapshot does not happen anymore.
The send/receive of nas1FS/backup (containing problematic file) gives same md5sums on receiving and sending side (compared all files on "nas1FS/backup" and "backup_raidz_test/backup").
When I have repeated commands suggested by bprotopopov on Dec 3, 2015 (with upgraded spl/zfs) I got same checksums on source ("nas1FS/test_fs_A") and target FS ("backup_raidz_test/test_fs_copy_A"):
# cat zfs_hole_transmit_test.sh
#!/bin/bash
date_today="20160701"
echo $date_today
zfs create -o recordsize=4k nas1FS/test_fs_A
zfs set compression=on nas1FS/test_fs_A
truncate -s 1G /nas1FS/test_fs_A/large_file
ls /nas1FS/test_fs_A/large_file -als
dd if=/dev/urandom of=/nas1FS/test_fs_A/large_file bs=4k count=$((3*128)) seek=$((1*128))
zfs snapshot nas1FS/test_fs_A@$date_today"_1"
truncate -s $((2*128*4*1024)) /nas1FS/test_fs_A/large_file
dd if=/dev/urandom of=/nas1FS/test_fs_A/large_file bs=4k count=128 seek=$((3*128)) conv=notrunc
dd if=/dev/urandom of=/nas1FS/test_fs_A/large_file bs=4k count=10 seek=$((2*128)) conv=notrunc
zfs snapshot nas1FS/test_fs_A@$date_today"_2"
zfs send nas1FS/test_fs_A@$date_today"_1" |zfs recv backup_raidz_test/test_fs_copy_A
zfs send -i nas1FS/test_fs_A@$date_today"_1" nas1FS/test_fs_A@$date_today"_2" |zfs recv backup_raidz_test/test_fs_copy_A
ls -als /nas1FS/test_fs_A/large_file /backup_raidz_test/test_fs_copy_A/large_file
md5sum /nas1FS/test_fs_A/large_file /backup_raidz_test/test_fs_copy_A/large_file
md5sum /nas1FS/test_fs_A/large_file /nas1FS/test_fs_A/.zfs/snapshot/$date_today"_1"/large_file /nas1FS/test_fs_A/.zfs/snapshot/$date_today"_2"/large_file /backup_raidz_test/test_fs_copy_A/large_file /backup_raidz_test/test_fs_copy_A/.zfs/snapshot/$date_today"_1"/large_file /backup_raidz_test/test_fs_copy_A/.zfs/snapshot/$date_today"_2"/large_file
# bash zfs_hole_transmit_test.sh
20160701
1 -rw-r--r-- 1 root root 1073741824 Jul 1 16:34 /nas1FS/test_fs_A/large_file
384+0 records in
384+0 records out
1572864 bytes (1.6 MB) copied, 0.589256 s, 2.7 MB/s
128+0 records in
128+0 records out
524288 bytes (524 kB) copied, 0.206688 s, 2.5 MB/s
10+0 records in
10+0 records out
40960 bytes (41 kB) copied, 0.0151903 s, 2.7 MB/s
1113 -rw-r--r-- 1 root root 2097152 Jul 1 16:34 /backup_raidz_test/test_fs_copy_A/large_file
2223 -rw-r--r-- 1 root root 2097152 Jul 1 16:34 /nas1FS/test_fs_A/large_file
046b704054d6e84353ec61b8665dfb9a /nas1FS/test_fs_A/large_file
046b704054d6e84353ec61b8665dfb9a /backup_raidz_test/test_fs_copy_A/large_file
046b704054d6e84353ec61b8665dfb9a /nas1FS/test_fs_A/large_file
c45ee26328f1b78079a155d4b04d76d3 /nas1FS/test_fs_A/.zfs/snapshot/20160701_1/large_file
046b704054d6e84353ec61b8665dfb9a /nas1FS/test_fs_A/.zfs/snapshot/20160701_2/large_file
046b704054d6e84353ec61b8665dfb9a /backup_raidz_test/test_fs_copy_A/large_file
c45ee26328f1b78079a155d4b04d76d3 /backup_raidz_test/test_fs_copy_A/.zfs/snapshot/20160701_1/large_file
046b704054d6e84353ec61b8665dfb9a /backup_raidz_test/test_fs_copy_A/.zfs/snapshot/20160701_2/large_file
# for i in $(zfs list -o mountpoint |grep test_fs); do echo $i; for j in $(ls $i/.zfs/snapshot/ -1 );do md5s/.zfs/snapshot/$j/large_file" ; done; done;
/backup_raidz_test/test_fs_copy_A
c45ee26328f1b78079a155d4b04d76d3 /backup_raidz_test/test_fs_copy_A/.zfs/snapshot/20160701_1/large_file
046b704054d6e84353ec61b8665dfb9a /backup_raidz_test/test_fs_copy_A/.zfs/snapshot/20160701_2/large_file
/nas1FS/test_fs
0f41e9d5956532d434572b6f06d7b082 /nas1FS/test_fs/.zfs/snapshot/20160628_1/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160628_2/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160629/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160629_1/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160629_2/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160629_3/large_file
/nas1FS/test_fs_A
c45ee26328f1b78079a155d4b04d76d3 /nas1FS/test_fs_A/.zfs/snapshot/20160701_1/large_file
046b704054d6e84353ec61b8665dfb9a /nas1FS/test_fs_A/.zfs/snapshot/20160701_2/large_file
When I have tried to transfer (on v0.6.5.7) a test fs ("nas1FS/test_fs") created with same commands on v0.6.5.3 I got a checksum mismatch:
# zfs send -R "nas1FS/test_fs@20160629_3" |zfs receive -F "backup_raidz_test/test_fs"
# for i in $(zfs list -o mountpoint |grep test_fs); do echo $i; for j in $(ls $i/.zfs/snapshot/ -1 );do md5s/.zfs/snapshot/$j/large_file" ; done; done;
/backup_raidz_test/test_fs
0f41e9d5956532d434572b6f06d7b082 /backup_raidz_test/test_fs/.zfs/snapshot/20160628_1/large_file
7a8008b4cc398a0a4ad8a6c686aa0f6f /backup_raidz_test/test_fs/.zfs/snapshot/20160628_2/large_file
7a8008b4cc398a0a4ad8a6c686aa0f6f /backup_raidz_test/test_fs/.zfs/snapshot/20160629/large_file
7a8008b4cc398a0a4ad8a6c686aa0f6f /backup_raidz_test/test_fs/.zfs/snapshot/20160629_1/large_file
7a8008b4cc398a0a4ad8a6c686aa0f6f /backup_raidz_test/test_fs/.zfs/snapshot/20160629_2/large_file
7a8008b4cc398a0a4ad8a6c686aa0f6f /backup_raidz_test/test_fs/.zfs/snapshot/20160629_3/large_file
/backup_raidz_test/test_fs_copy_A
c45ee26328f1b78079a155d4b04d76d3 /backup_raidz_test/test_fs_copy_A/.zfs/snapshot/20160701_1/large_file
046b704054d6e84353ec61b8665dfb9a /backup_raidz_test/test_fs_copy_A/.zfs/snapshot/20160701_2/large_file
/nas1FS/test_fs
0f41e9d5956532d434572b6f06d7b082 /nas1FS/test_fs/.zfs/snapshot/20160628_1/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160628_2/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160629/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160629_1/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160629_2/large_file
2b1b5792f6a45d45a5399ff977ca6fda /nas1FS/test_fs/.zfs/snapshot/20160629_3/large_file
/nas1FS/test_fs_A
c45ee26328f1b78079a155d4b04d76d3 /nas1FS/test_fs_A/.zfs/snapshot/20160701_1/large_file
046b704054d6e84353ec61b8665dfb9a /nas1FS/test_fs_A/.zfs/snapshot/20160701_2/large_file
How should I proceed?
Will providing images of a pool created in v0.6.5.3 exhibiting this behavior or any other tests be helpful?
It seems that I have not tested enough.
The issue is still present after the upgrade to v0.6.5.7.
The crash during the access to not existing file in snapshot does not happen anymore but I can still trigger the send/receive snapshot corruption.
Previously I have entered commands by hand or repeated from console history so I have introduced some significant delays between commands.
Later I have automated the testing too much and thanks to that the bug was not triggered (even when I tried to trigger it on v0.6.5.3).
I have tested again with following script:
#!/bin/bash
for i in $(seq 0 1 10); # sleep time loop
do
for j in $(seq 1 1 10); #pool size loop
do
pool_name="zfs_pool_v0.6.5.7_A_"$i"s_$j"
src_fs="$pool_name/test_fs_A"
target_fs="$pool_name/test_fs_B"
date_today="20160702"
echo $pool_name ; echo $src_fs ; echo $target_fs; echo $date_today
dd if=/dev/zero of=/nas1FS/tmp/$pool_name bs=4k count=$(($j*128*128)) # pool size has an impact on bug triggeringl
zpool create -o ashift=12 $pool_name /nas1FS/tmp/$pool_name
zfs create -o recordsize=4k $src_fs
zfs set compression=on $src_fs
truncate -s 1G /$src_fs/large_file
ls -als /$src_fs/large_file
dd if=/dev/urandom of=/$src_fs/large_file bs=4k count=$((3*128)) seek=$((1*128))
zfs snapshot $src_fs@$date_today"_1"
truncate -s $((2*128*4*1024)) /$src_fs/large_file
dd if=/dev/urandom of=/$src_fs/large_file bs=4k count=128 seek=$((3*128)) conv=notrunc
sleep $i; #10; # <=this sleep is critical for triggering the bug when pool is large enough
dd if=/dev/urandom of=/$src_fs/large_file bs=4k count=10 seek=$((2*128)) conv=notrunc
zfs snapshot $src_fs@$date_today"_2"
zfs send -R $src_fs@$date_today"_2" |zfs recv $target_fs
ls -als /$src_fs/large_file /$target_fs/large_file
md5sum /$src_fs/.zfs/snapshot/$date_today"_1"/large_file /$target_fs/.zfs/snapshot/$date_today"_1"/large_file
md5sum /$src_fs/.zfs/snapshot/$date_today"_2"/large_file /$target_fs/.zfs/snapshot/$date_today"_2"/large_file
md5sum /$src_fs/large_file /$target_fs/large_file
md5sum /$src_fs/.zfs/snapshot/$date_today"_1"/large_file /$target_fs/.zfs/snapshot/$date_today"_1"/large_file >> /nas1FS/tmp/final_test_md5sums.txt>> /nas1FS/tmp/final_test_md5sums.txt
md5sum /$src_fs/.zfs/snapshot/$date_today"_2"/large_file /$target_fs/.zfs/snapshot/$date_today"_2"/large_file >> /nas1FS/tmp/final_test_md5sums.txt>> /nas1FS/tmp/final_test_md5sums.txt
md5sum /$src_fs/large_file /$target_fs/large_file >> /nas1FS/tmp/final_test_md5sums.txt>> /nas1FS/tmp/final_test_md5sums.txt
zpool export $pool_name
done;
done;
This script repeated the same pool create / file create / snapshot / transfer cycle for different pool sizes and with different delays between file modifications.
It seems that the send/receive issue was only triggered when the delay was longer then 4seconds or when the size of the image holding the pool was small (2^26 bytes).
Matrix of tested values of delays / pool image size (corrupted snapshot transfer bug was triggered at values intersecting in areas marked in red):

I am only speculating, but is it possible that the issue is connected somehow to syncing data to the disk (syncing of data from memory to disk is not consistent)?
@pwolny
what are the S.M.A.R.T. stats for those drives ? anything unusual ?
can you post those somewhere ?
smartctl -x /dev/foo
Is there a possibility you can exchange
“backup_raidz_test” pool: 1TB Samsung HD103UJ (pool with no parity, for additional tests)
and/or cables
for a different drive ?
All drives are connected to the same harddrive controller ?
http://www.supermicro.com/products/motherboard/Atom/X10/A1SA7-2750F.cfm
LSI 2116 controller for 16x SATA3 /
SAS2 ports plus 1x SATA3 SMCI
SATA DOM
are any issues known with those and ZFS ?
firmware or driver updates ?
that's at least what comes to mind right now ...
@kernelOfTruth
all disks seem to be ok. nothing unusual about them from smartctl.
nas1FS drives are very similar in smartctl :
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: HGST Travelstar 7K1000
Device Model: HGST HTS721010A9E630
Firmware Version: JB0OA3J0
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Jul 3 08:41:18 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 45) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 171) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate PO-R-- 100 100 062 - 0
2 Throughput_Performance P-S--- 100 100 040 - 0
3 Spin_Up_Time POS--- 114 114 033 - 2
4 Start_Stop_Count -O--C- 100 100 000 - 16
5 Reallocated_Sector_Ct PO--CK 100 100 005 - 0
7 Seek_Error_Rate PO-R-- 100 100 067 - 0
8 Seek_Time_Performance P-S--- 100 100 040 - 0
9 Power_On_Hours -O--C- 100 100 000 - 208
10 Spin_Retry_Count PO--C- 100 100 060 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 16
191 G-Sense_Error_Rate -O-R-- 100 100 000 - 0
192 Power-Off_Retract_Count -O--CK 100 100 000 - 13
193 Load_Cycle_Count -O--C- 100 100 000 - 57
194 Temperature_Celsius -O---- 222 222 000 - 27 (Min/Max 20/35)
196 Reallocated_Event_Count -O--CK 100 100 000 - 0
197 Current_Pending_Sector -O---K 100 100 000 - 0
198 Offline_Uncorrectable ---R-- 100 100 000 - 0
199 UDMA_CRC_Error_Count -O-R-- 200 200 000 - 0
223 Load_Retry_Count -O-R-- 100 100 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 1 Comprehensive SMART error log
0x03 GPL R/O 1 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 27 Celsius
Power Cycle Min/Max Temperature: 26/27 Celsius
Lifetime Min/Max Temperature: 20/35 Celsius
Lifetime Average Temperature: 28 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -40/65 Celsius
Temperature History Size (Index): 128 (106)
Index Estimated Time Temperature Celsius
107 2016-07-03 06:34 28 *********
... ..(114 skipped). .. *********
94 2016-07-03 08:29 28 *********
95 2016-07-03 08:30 ? -
96 2016-07-03 08:31 26 *******
97 2016-07-03 08:32 26 *******
98 2016-07-03 08:33 27 ********
... ..( 7 skipped). .. ********
106 2016-07-03 08:41 27 ********
SCT Error Recovery Control:
Read: Disabled
Write: Disabled
Device Statistics (GP/SMART Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0009 2 3 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 3 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000d 2 0 Non-CRC errors within host-to-device FIS
All disks are connected to the same controller (LSI) on A1SA7-2750F, the motherboard is as received (no FW updates).
I have repeated the tests with the pool backed by a ramdisk file, so no bad cables or controllers and I can still trigger the bug with script from https://github.com/zfsonlinux/zfs/issues/4809#issuecomment-230123528
I have retested also on a laptop:
Linux 4.5.0-1-amd64 #1 SMP Debian 4.5.1-1 (2016-04-14) x86_64 GNU/Linux
with spl/zfs v0.6.5.7 with the pool backed by a ramdisk file with above mentioned script and I can also trigger it.
I have retested on another (much older) computer :
Linux 2.6.38-2-amd64 #1 SMP Sun May 8 13:51:57 UTC 2011 x86_64 GNU/Linux
with spl/zfs v0.6.0 with the pool backed by a ramdisk file with above mentioned script and I can not trigger it.
@pwolny I'm half-asleep so I might not be reading this right, but this looks like the same bug, but with a somewhat different behavior wrt. size and delay: https://gist.github.com/lnicola/1e069f2abaee1dbaaeafc05437b0777a
@lnicola It seems that your script output exhibits the same checksum mismatch footprint as mine. What kernel / zfs version were you using?
Sorry for not mentioning those. I'm on 4.6.3-1-ARCH with 0.6.5.7_4.6.3_1-1, i.e. the latest versions.
That's pretty concerning, I'm mostly backing up my files only with zfs send nowadays, too & also running 4.6.3 - ZFS is on top of cryptsetup for me,
@pwolny are you using any encryption or partition "alteration" mechanism (cryptsetup, lvm, etc.) ?
@pwolny @lnicola are your pools running with the latest ZFS feature set ?
what does the zpool upgrade command say ?
(I can't currently provide sample output of my pool since I'm in Windows, haven't upgraded it in a while so as far as a I know 2 are still missing on my /home pool)
_edit:_
I'm currently occupied otherwise but will see if I can reproduce it, too, here later in the day
referencing https://github.com/zfsonlinux/zfs/pull/4754 (6513 partially filled holes lose birth time) which is in master: https://github.com/zfsonlinux/zfs/commit/bc77ba73fec82d37c0b57949ec29edd9aa207677
@kernelOfTruth yes, I do have hole_birth active.
@pwolny you're relying to show that issue with the samsung drive and gzip-9 compression,
I have created a “backup_raidz” pool for backup (with compression turned on):
# zpool create -o ashift=12 -O compression=gzip-9 backup_raidz raidz1 /dev/disk/by-id/ata-SAMSUNG_HD103UJ_SerialNo1 /dev/disk/by-id/ata-SAMSUNG_HD103UJ_SerialNo2 /dev/disk/by-id/ata-HGST_HTS721010A9E630_SerialNo3 -f
does this also happen when you explicitly set lz4 compression ?
or one of the other types?
referencing: https://github.com/zfsonlinux/zfs/issues/4530 Data Corruption During ZFS send/receieve
@pwolny please take a look at comment: https://github.com/zfsonlinux/zfs/issues/4530#issuecomment-211996617 especially the part related to destroying the snapshot
@kernelOfTruth
About the #4530 (comment):
@pwolny What if you replace the use of truncate with dd?
I have retested multiple versions of spl/zfs (built from source) on a virtual machine running Knoppix 7.4.2 with kernel
Linux Microknoppix 3.16.3-64 #10 SMP PREEMPT Fri Sep 26 02:00:22 CEST 2014 x86_64 GNU/Linux
and I can very reliably reproduce this issue (with script mentioned in https://github.com/zfsonlinux/zfs/issues/4809#issuecomment-230123528 ).
During testing I have seen three possible footprints:



To summarize tested versions (different software and hardware):
v0.6.0-rc8 - is ok
v0.6.3.1-1.3 - is ok
v0.6.4 - exhibits the bug partially
v0.6.4.1 - exhibits the bug partially
v0.6.4.2 - exhibits the bug partially
v0.6.5 - exhibits the bug fully
v0.6.5.2 - exhibits the bug fully
v0.6.5.3 - exhibits the bug fully
v0.6.5.7 - exhibits the bug fully
So it seems that the issue (or possibly two) was introduced just before v0.6.4 and was possibly aggravated in v0.6.5.
@lnicola If I replace:
truncate -s 1G /$src_fs/large_file
with
dd if=/dev/zero of=/$src_fs/large_file bs=1 count=1 seek=1G
I still can trigger the bug fully.
What about the other truncate usage? My thinking is that it could prove this is not related to the hole_birth feature.
I have tested few modifications of the testing script.
This is the relevant part of this testing :
dd if=/dev/zero of="$pool_file_path/$pool_name" bs=4k count=$(($j*128*128)) # pool size has an impact on bug triggeringl
zpool create -o ashift=12 $pool_name "$pool_file_path/$pool_name"
zfs create -o recordsize=4k $src_fs
zfs set compression=on $src_fs
#========================1st truncate/dd=====================================#
truncate -s 1G /$src_fs/large_file #possible to trigger the bug
#dd if=/dev/zero of=/$src_fs/large_file bs=1G count=1 #possible to trigger the bug
#dd if=/dev/zero of=/$src_fs/large_file bs=1G count=1 conv=notrunc #possible to trigger the bug
#========================1st truncate/dd=====================================#
ls -als /$src_fs/large_file
dd if=/dev/urandom of=/$src_fs/large_file bs=4k count=$((3*128)) seek=$((1*128))
zfs snapshot $src_fs@$date_today"_1"
#========================2nd truncate/dd=====================================#
truncate -s $((2*128*4*1024)) /$src_fs/large_file #possible to trigger bug
#truncate -s $((2*128*4*1024+1)) /$src_fs/large_file #not triggerring the bug
#dd if=/dev/zero of=/$src_fs/large_file bs=1 count=1 seek=$((2*128*4*1024)) #not triggering the bug
#dd if=/dev/zero of=/$src_fs/large_file bs=1 count=1 seek=$((2*128*4*1024-1)) #possible to trigger bug
#dd if=/dev/zero of=/$src_fs/large_file bs=1 count=1 seek=$((2*128*4*1024-1)) conv=notrunc #with this the bug never triggers
#========================2nd truncate/dd=====================================#
ls -als /$src_fs/large_file
dd if=/dev/urandom of=/$src_fs/large_file bs=4k count=128 seek=$((3*128)) conv=notrunc
sleep $i; #10; # <=this sleep is critical for triggering the bug when pool is large enough
dd if=/dev/urandom of=/$src_fs/large_file bs=4k count=10 seek=$((2*128)) conv=notrunc
zfs snapshot $src_fs@$date_today"_2"
It seems that only second truncate/dd command has any impact on triggering of the issue.
Maybe someone will find this version of the script useful:
zfs_test3.sh.txt
It will try to create a "test_matrix_output.txt" file in "/tmp/zfs" directory (script does not create this directory).
If in this file are any "E" it means that there was a checksum mismatch between source and target transfered snapshot.
If the matrix is filled only with "o" everything went ok.
Example "test_matrix_output.txt" with errors will look like this (output with spl/zfs v0.6.5.7)
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
Hi, pwolny,
in the interests of clarity, can the bug be reproduced by running one version of one script with reasonable frequency, and if so, could you please publish the shortest version of such script ?
Ideally, the script would include creation and cleanup of a pool based on file vdevs. I am trying to abstract away the site specifics. To my knowledge, the original issues with partly filled holes and reused dnodes (Illumos 6370, Illumos 6513) can be reproduced on such pools.
Best regards, Boris.
From: pwolny [email protected]
Sent: Tuesday, July 5, 2016 7:35:00 PM
To: zfsonlinux/zfs
Cc: Boris Protopopov; Mention
Subject: Re: [zfsonlinux/zfs] Silently corrupted file in snapshots after send/receive (#4809)
Maybe someone will find this version of the script useful:
zfs_test3.sh.txthttps://github.com/zfsonlinux/zfs/files/348819/zfs_test3.sh.txt
It will try to create a "test_matrix_output.txt" file in "/tmp/zfs" directory (script does not create this directory).
If in this file are any "E" it means that there was a checksum mismatch between source and target transfered snapshot.
If the matrix is filled only with "o" everything went ok.
Example "test_matrix_output.txt" with errors will look like this (output with spl/zfs v0.6.5.7)
EEEEEEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-230633478, or mute the threadhttps://github.com/notifications/unsubscribe/ACX4uUHtdxskZcz_gzXobTL9IrTNjSpWks5qSuokgaJpZM4I-nnL.
Hello @bprotopopov,
Basically this script contains commands suggested by you on Dec 3, 2015 in bug report #4050. Only critical additions needed for triggering are a variable time delay and pool backing file size, the rest was only needed for test automation.
Does it not trigger on your system? Are there any "E" letters in the "test_matrix_output.txt" file in "/tmp/zfs" directory?
For me script in "zfs_test3.sh.txt" triggers always on affected systems. It also can help differentiate between 2 types of triggered behavior (please take a look at previous posts with images depicting different footprints generated by this script for different zfs / spl versions).
If you need to minimize test time then just set:
sleep time in loop to 6 second delay " $(seq 6 1 6)"
and pool size loop to 1st size multiplier "$(seq 1 1 1)"
it should always trigger on an affected system, but that way you will loose a possibility of differentiation between the triggered effects.
Anyway the latest script version sets up a pool backed by a file and removes it afterwards, in fact it does that in total about 55 times in the testing loop.
It returns even a non zero exit status on error.
If you want to use it in an automatic and headless testing environment you will probably need to comment out the 3 lines with "less" at the end.
Maybe it leaves some text files in the "/tmp/zfs/" folder but I wanted them for verification.
Best regards
I have done some regression testing and it seems that the error was introduced in this commit (zfs-0.6.3-27):
b0bc7a84d90dcbf5321d48c5b24ed771c5a128b0 Illumos 4370, 4371 (4370 avoid transmitting holes during zfs send ;4371 DMU code clean up)
It generates a following footprint:
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
While a previous commit (zfs-0.6.3-26)
fa86b5dbb6d33371df344efb2adb0aba026d097c Illumos 4171, 4172
does not generate any errors
ooooooooooo
ooooooooooo
ooooooooooo
ooooooooooo
Both were tested with spl-0.6.3-10, commit :
e3020723dc43af2bc22af0d68571a61daf9b44d0 Linux 3.16 compat: smp_mb__after_clear_bit()
Could someone take a look at those code changes? Unfortunately that is as far as I can go with debugging this.
OK, great,
let me take a quick look.
Boris.
From: pwolny [email protected]
Sent: Thursday, July 7, 2016 12:09:41 AM
To: zfsonlinux/zfs
Cc: Boris Protopopov; Mention
Subject: Re: [zfsonlinux/zfs] Silently corrupted file in snapshots after send/receive (#4809)
Hello @bprotopopovhttps://github.com/bprotopopov,
Basically this script contains commands suggested by you on Dec 3, 2015 in bug report #4050https://github.com/zfsonlinux/zfs/issues/4050. Only critical additions needed for triggering are a variable time delay and pool backing file size, the rest was only needed for test automation.
Does it not trigger on your system? Are there any "E" letters in the "test_matrix_output.txt" file in "/tmp/zfs" directory?
For me script in "zfs_test3.sh.txt" triggers always on affected systems. It also can help differentiate between 2 types of triggered behavior (please take a look at previous posts with images depicting different footprints generated by this script for different zfs / spl versions).
If you need to minimize test time then just set:
sleep time in loop to 6 second delay " $(seq 6 1 6)"
and pool size loop to 1st size multiplier "$(seq 1 1 1)"
it should always trigger on an affected system, but that way you will loose a possibility of differentiation between the triggered effects.
Anyway the latest script version sets up a pool backed by a file and removes it afterwards, in fact it does that in total about 55 times in the testing loop.
It returns even a non zero exit status on error.
If you want to use it in an automatic and headless testing environment you will probably need to comment out the 3 lines with "less" at the end.
Maybe it leaves some text files in the "/tmp/zfs/" folder but I wanted them for verification.
Best regards
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-230974519, or mute the threadhttps://github.com/notifications/unsubscribe/ACX4uWNV7Z2Y0pLNkPsWO7i2EXiMfrJdks5qTHwFgaJpZM4I-nnL.
Well that seems appropriate - the commit in question seems to introduce support for the hole_birth feature.
From: pwolny [email protected]
Sent: Thursday, July 7, 2016 8:26:34 AM
To: zfsonlinux/zfs
Cc: Boris Protopopov; Mention
Subject: Re: [zfsonlinux/zfs] Silently corrupted file in snapshots after send/receive (#4809)
I have done some regression testing and it seems that the error was introduced in this commit (zfs-0.6.3-27):
b0bc7a8https://github.com/zfsonlinux/zfs/commit/b0bc7a84d90dcbf5321d48c5b24ed771c5a128b0 Illumos 4370, 4371 (4370 avoid transmitting holes during zfs send ;4371 DMU code clean up)
It generates a following footprint:
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
While a previous commit (zfs-0.6.3-26)
fa86b5dhttps://github.com/zfsonlinux/zfs/commit/fa86b5dbb6d33371df344efb2adb0aba026d097c Illumos 4171, 4172
does not generate any errors
ooooooooooo
ooooooooooo
ooooooooooo
ooooooooooo
ooooooooooo
Both were tested with spl-0.6.3-10, commit :
e302072https://github.com/zfsonlinux/zfs/commit/e3020723dc43af2bc22af0d68571a61daf9b44d0 Linux 3.16 compat: smp_mb__after_clear_bit()
Could someone take a look at those code changes? Unfortunately that is as far as I can go with debugging this.
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-231062968, or mute the threadhttps://github.com/notifications/unsubscribe/ACX4uaT4Rqoxawi8RNa5Fxdtaz6slTWCks5qTPB6gaJpZM4I-nnL.
For anyone who's worried, this script might help finding sparse files: http://unix.stackexchange.com/a/86446.
EDIT: While the script works, it will only find sparse files, not files affected by this issue. Depending on your workload, data and compression setting, you might have very few sparse files on a drive. In such cases, it's better to only worry about those specific files.
Also, if compression is off, making a copy of a spare file will yield a non-sparse file. This can be used to fix the underlying corruption on the source pool. Once the bug/bugs are fixed, making a copy will also work when compression is enabled.
Hi, @pwolny,
I ran your script (for a while) with a 0.6.3-based ZFS plus the above-mentioned fixes cherry-picked:
commit c183d75b2bf163fa5feef301dfc9c2db885cc652
Author: Paul Dagnelie [email protected]
Date: Sun May 15 08:02:28 2016 -0700
OpenZFS 6513 - partially filled holes lose birth time
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Boris Protopopov <[email protected]>
Approved by: Richard Lowe <[email protected]>a
Ported by: Boris Protopopov <[email protected]>
Signed-off-by: Boris Protopopov <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
OpenZFS-issue: https://www.illumos.org/issues/6513
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8df0bcf0
If a ZFS object contains a hole at level one, and then a data block is
created at level 0 underneath that l1 block, l0 holes will be created.
However, these l0 holes do not have the birth time property set; as a
result, incremental sends will not send those holes.
Fix is to modify the dbuf_read code to fill in birth time data.
commit a29427227090abdaa2d63bfdb95746362a0504c1
Author: Alex Reece [email protected]
Date: Thu Apr 21 11:23:37 2016 -0700
Illumos 6844 - dnode_next_offset can detect fictional holes
6844 dnode_next_offset can detect fictional holes
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Brian Behlendorf <[email protected]>
dnode_next_offset is used in a variety of places to iterate over the
holes or allocated blocks in a dnode. It operates under the premise that
it can iterate over the blockpointers of a dnode in open context while
holding only the dn_struct_rwlock as reader. Unfortunately, this premise
does not hold.
When we create the zio for a dbuf, we pass in the actual block pointer
in the indirect block above that dbuf. When we later zero the bp in
zio_write_compress, we are directly modifying the bp. The state of the
bp is now inconsistent from the perspective of dnode_next_offset: the bp
will appear to be a hole until zio_dva_allocate finally finishes filling
it in. In the meantime, dnode_next_offset can detect a hole in the dnode
when none exists.
I was able to experimentally demonstrate this behavior with the
following setup:
1. Create a file with 1 million dbufs.
2. Create a thread that randomly dirties L2 blocks by writing to the
first L0 block under them.
3. Observe dnode_next_offset, waiting for it to skip over a hole in the
middle of a file.
4. Do dnode_next_offset in a loop until we skip over such a non-existent
hole.
The fix is to ensure that it is valid to iterate over the indirect
blocks in a dnode while holding the dn_struct_rwlock by passing the zio
a copy of the BP and updating the actual BP in dbuf_write_ready while
holding the lock.
References:
https://www.illumos.org/issues/6844
https://github.com/openzfs/openzfs/pull/82
DLPX-35372
Ported-by: Brian Behlendorf <[email protected]>
Closes #4548
commit f506a3da14d7ef75752df9311354e4ea4dc3354d
Author: Paul Dagnelie [email protected]
Date: Thu Feb 25 20:45:19 2016 -0500
Illumos 6370 - ZFS send fails to transmit some holes
6370 ZFS send fails to transmit some holes
Reviewed by: Matthew Ahrens <[email protected]>
Reviewed by: Chris Williamson <[email protected]>
Reviewed by: Stefan Ring <[email protected]>
Reviewed by: Steven Burgess <[email protected]>
Reviewed by: Arne Jansen <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
References:
https://www.illumos.org/issues/6370
https://github.com/illumos/illumos-gate/commit/286ef71
In certain circumstances, "zfs send -i" (incremental send) can produce
a stream which will result in incorrect sparse file contents on the
target.
The problem manifests as regions of the received file that should be
sparse (and read a zero-filled) actually contain data from a file that
was deleted (and which happened to share this file's object ID).
Note: this can happen only with filesystems (not zvols, because they do
not free (and thus can not reuse) object IDs).
Note: This can happen only if, since the incremental source (FromSnap),
a file was deleted and then another file was created, and the new file
is sparse (i.e. has areas that were never written to and should be
implicitly zero-filled).
We suspect that this was introduced by 4370 (applies only if hole_birth
feature is enabled), and made worse by 5243 (applies if hole_birth
feature is disabled, and we never send any holes).
The bug is caused by the hole birth feature. When an object is deleted
and replaced, all the holes in the object have birth time zero. However,
zfs send cannot tell that the holes are new since the file was replaced,
so it doesn't send them in an incremental. As a result, you can end up
with invalid data when you receive incremental send streams. As a
short-term fix, we can always send holes with birth time 0 (unless it's
a zvol or a dataset where we can guarantee that no objects have been
reused).
Ported-by: Steven Burgess <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4369
Closes #4050
I could not reproduce any issues. All tests passed.
I am running on a Centos-6.7 machine.
I will try the tip of the zfsonlinux master now.
Everything tested OK for ZFS 0.6.5-329_g5c27b29 and the matching SPL 0.6.5-63_g5ad98ad.
@bprotopopov is it possible this is a problem with some code miscompiling on certain platforms? I can't see anyone else in the thread running on CentOS.
e: on further reflection, I find myself thinking it's more likely to be a kernel-specific issue than a miscompiling issue.
Well, there went that theory. Got "NOT ok" for SPL/ZFS 0.6.5.7-1 on Centos 6.6 (kernel 2.6.32-642.1.1) in a VM (using the {spl,zfs}-dkms packages on zfsonlinux.org).
@rincebrain
not sure what you mean by miscompiling. If this is a timing issue of some kind, then we need to understand it better. If you or @pwolny can publish
zdb -ddddd pool/fs object
output for the files in question, I could get a better idea of the nature of the issue.
@bprotopopov I mean, I can hand you the emitted send streams and the tiny files backing the pools, if that would be of use.
How would you suggest I get the appropriate object reference to hand zdb?
Barring that, since the entire -ddddd output of the demonstration pool's two FSes is tiny, I'll just attach both of those.
test_fs_A.txt
test_fs_B.txt
And, as a bonus, here's the sparse file containing the entire pool that's output from, since the compressed file is only 2.5 MB (it unpacks to a 192MB sparse file that occupies about 32MB)
https://www.dropbox.com/s/ir8b6ot07k420mn/zfs_pool_0.6.5.6-1_A_7s_3.xz
Just so other people don't go down the rabbit hole when someone else already got there...
@pcd1193182 concluded this is "just" Illumos 6513.
So, let me expand on that slightly. The fix for illumos 6513 will solve this problem, but not retroactively. In other words, any pools that already have this state will not be fixed; on such pools, the hole BPs already have their birth times set to zero, and zfs has no way of telling which ones are bugged and which are right. Any new pools, created after the fix has been applied, will work.
For people who are stuck with their old pools, a workaround could be created that would basically disable hole birth. Ideally this would be a tunable so that it could be applied selectively. On illumos, the patch looks like this: https://gist.github.com/pcd1193182/2c0cd47211f3aee623958b4698836c48 . I understand that on linux, making variables tunable requires some fiddling. Whether to set the default to TRUE or FALSE, I leave to others.
On Thu, 7 Jul 2016, Laurentiu Nicola wrote:
For anyone who's worried, this script might help finding sparse files: http://unix.stackexchange.com/a/86446.
Are the holes still going to exist (albeit incorrectly sized) on the
recieved filesystem, for those of us who have already done our migrations
via 'send|recv' and discarded the original media?
Tim Connors
@pcd1193182
this is a good point to keep in mind, yes, the snapshots created with buggy code will not be cured by the fixed code because the on-disk block pointers do not contain proper (non-zero) birth epochs.
However, @pwolny has published a script that he claims re-creates the issue with entirely new pools created during the test. I ran it, could not see the issue, but there is zdb dump above of the new pool with the problem I will take a look at.
Hi, @rincebrain
zdb lets you list filesystems such that you can see which file is which dnode number. that number can be used as 'obejct' argument to dump just that file. For zvols, this is simpler, as zvol (data) object is always 1 as I recall. BTW, this bug affects zvols just as well.
OK, I looked at the zdb output, and assuming that I interpreted the notation use by @rincebrain correctly - A is for the source fs (to be sent) and B is for target (received), I do see that that on the source side, the L1 block that is partly filled with 10 L0 (non-hole) block pointers still has the L0 hole block pointers with 0 epochs. The range 10a000-180000 is not shown in the zdb output. The 'final' confirmation could be obtained by dumping the L1 block in question using zdb. I can dig up the command to do that, but I think it is not needed as we have enough evidence.
This causes those holes not being transmitted by the send, and therefore on the receive side, the filesystem still contains non-zero data. That was the specific scenario created by the test script.
So, here is my recommendation.
Can @rincebrain and @pwolny please amend the script that was used to reproduce the issue with
'sync' commands after every dd/truncate/other updates (or, alternatively, before taking snapshots at least), such that we could be somewhat sure this is not the 'linux page cache' issue ?
@bprotopopov the script that he linked above is the one that I used to test. The bug does not appear on illumos, because we have the fix for illumos 6513. Linux does not yet have that fix integrated, so anyone running trunk or older ZoL will encounter the issue.
EDIT: Apparently I'm mistaken; the fix is in trunk, but not in any release yet.
Hi, @pcd1193182
I have already ported 6513 fix to ZoL, and it is on the master branch as of https://github.com/zfsonlinux/zfs/commit/bc77ba73fec82d37c0b57949ec29edd9aa207677
I assumed that release 0.6.5.7 mentioned above included that fix. If it did not :) then the issue is quite obvious!
So, @pcd1193182 nailed it :)
@pwolny, @rincebrain, zfs-0.6.5.7 definitely does not include the fix for 6513, and that is why you see the issue.
Another mystery solved, I guess.
some instruction for dummy about how to deal with the aftermath will be helpful,
I am quite confuse about how should I take care of my system,
don't have enough resource to pull out my entire pool and restore,
which make me have no way to get out of this until next time I upgrade the system.
Briefly, for anyone with a pool with hole_birth enabled.
@rincebrain
I have a question,
may I just inplace rewrite or do a copy of those sparse files after patch #4754 to make them free from hole birth issue?
@AndCycle you'd need to do something like cp foo foo_1 && mv foo_1 foo, and this wouldn't fix it in any of the snapshots you have. If you applied #4833 and turned on the ignore_hole_birth tunable, then did a send|recv the resulting copy should not have this particular flaw.
I'd probably wait until after illumos #7176 has a fix available+merged to bother, though.
@rincebrain thanks for your detailed explanation, looks like there are more issue wait to be discovered.
We'll get bc77ba7 from the master branch pulled in for 0.6.5.8.
Also recommend picking up
https://github.com/zfsonlinux/zfs/commit/463a8cfe2b293934edd2ee79115b20c4598353d6
[https://2.gravatar.com/avatar/8102cb4a97c5b057b8f2cab11480bd81?d=https%3A%2F%2Fassets-cdn.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png&r=x&s=200]https://github.com/zfsonlinux/zfs/commit/463a8cfe2b293934edd2ee79115b20c4598353d6
Illumos 6844 - dnode_next_offset can detect fictional holes · zfsonlinux/zfs@463a8cfhttps://github.com/zfsonlinux/zfs/commit/463a8cfe2b293934edd2ee79115b20c4598353d6
github.com
6844 dnode_next_offset can detect fictional holes Reviewed by: Matthew Ahrens Reviewed by: George Wilson Reviewed by: Brian Behlendorf
From: Brian Behlendorf [email protected]
Sent: Tuesday, July 12, 2016 12:49:12 PM
To: zfsonlinux/zfs
Cc: Boris Protopopov; Mention
Subject: Re: [zfsonlinux/zfs] Silently corrupted file in snapshots after send/receive (#4809)
We'll get bc77ba7https://github.com/zfsonlinux/zfs/commit/bc77ba73fec82d37c0b57949ec29edd9aa207677 from the master branch pulled in for 0.6.5.8.
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-232106427, or mute the threadhttps://github.com/notifications/unsubscribe/ACX4uQ-OHIe-aowRR6uO8loA4ZJ_fwx1ks5qU8WIgaJpZM4I-nnL.
I found some time to finish testing.
It seems that with ZFS 0.6.5-329_g5c27b29 and SPL 0.6.5-63_g5ad98ad I get different behaviors if I send /receive the pool with the original problematic file ("/nas1FS/backup/.zfs/snapshot/20160618/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb") and the pool generated with the test script.
So it seems that the symptom of the original problem solved itself with ZFS 0.6.5-329_g5c27b29 and SPL 0.6.5-63_g5ad98ad. That is somewhat baffling.
Maybe now this is superfluous information but I have located zfs versions when the test script changes behavior and gives me slightly different outputs.
with "zfs 0.6.4-53 - 3df293404a102398445fc013b67250073db9004e - Fix type mismatch on 32-bit systems" I get "test_matrix_output.txt" :
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
with "zfs 0.6.4-54 – f1512ee61e2f22186ac16481a09d86112b2d6788 - Illumos 5027 - zfs large block support" I get "test_matrix_output.txt":
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
(In the above matrix sleep time increases from left to right and pool size increases from top to bottom)
@bprotopopov I have tested also with sync commands added to the test script and I did not see any change in behavior (same error footprints or same lack of errors for certain versions of zfs/spl).
By the way is there an easier method to obtain an ID of single file for zdb than doing something like that:
zdb -dddd nas1FS/backup |grep -e "ZFS plain file" -e "path" | grep "home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb" -B 5
sorry to ask that here, but maybe I am missing something obvious?
Thanks for the help, I really appreciate all the work that was put into the zfsonlinux.
Hi, @pwolny,
sorry, I don't know of an easier way to find the dnode number other than grepping for name in the zdb output. I have noticed however, that the first dnode number in ZFS filesystems that is allocated for regular files is 7. So, if you have one file in a filesystem, it's #7. Also, if you know the order of file creation, the subsequent dnode numbers are going to be 8, 9, ...
I admit I am not quite following the description of the test results in your message. Do I understand if correctly that the datasets created by the code with the fix for the metadata holes (illumos 6315) are OK when transmitted with zfs send (with or without the fix, does not matter), but datasets that were created by the code without such fix show errors (again, when transmitted using either version of the code)? That is expected, as the bug reflects itself in the on-disk data, not in the zfs send code - once the data is written out incorrectly, then unless you turn off the hole_birth optimization (the patch with the tunable discussed earlier in this thread), zfs send will not transmit the holes properly.
Is that the third case you are describing ? That you were able to transmit broken on-disk data correctly with the patch that disabled the hole_birth optimization ? If so, this is as expected.
Boris.
From: pwolny [email protected]
Sent: Tuesday, July 12, 2016 8:10:43 PM
To: zfsonlinux/zfs
Cc: Boris Protopopov; Mention
Subject: Re: [zfsonlinux/zfs] Silently corrupted file in snapshots after send/receive (#4809)
I found some time to finish testing.
It seems that with ZFS 0.6.5-329_g5c27b29 and SPL 0.6.5-63_g5ad98ad I get different behaviors if I send /receive the pool with the original problematic file ("/nas1FS/backup/.zfs/snapshot/20160618/samba_share/a/home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb") and the pool generated with the test script.
So it seems that the symptom of the original problem solved itself with ZFS 0.6.5-329_g5c27b29 and SPL 0.6.5-63_g5ad98ad. That is somewhat baffling.
Maybe now this is superfluous information but I have located zfs versions when the test script changes behavior and gives me slightly different outputs.
with "zfs 0.6.4-53 - 3df2934https://github.com/zfsonlinux/zfs/commit/3df293404a102398445fc013b67250073db9004e - Fix type mismatch on 32-bit systems" I get "test_matrix_output.txt" :
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
with "zfs 0.6.4-54 - f1512eehttps://github.com/zfsonlinux/zfs/commit/f1512ee61e2f22186ac16481a09d86112b2d6788 - Illumos 5027 - zfs large block support" I get "test_matrix_output.txt":
EEEEEEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
oooooEEEEEE
(In the above matrix sleep time increases from left to right and pool size increases from top to bottom)
@bprotopopovhttps://github.com/bprotopopov I have tested also with sync commands added to the test script and I did not see any change in behavior (same error footprints or same lack of errors for certain versions of zfs/spl).
By the way is there an easier method to obtain an ID of single file for zdb than doing something like that:
zdb -dddd nas1FS/backup |grep -e "ZFS plain file" -e "path" | grep "home/bak/aa/wx/wxWidgets-2.8.12/additions/lib/vc_lib/wxmsw28ud_propgrid.pdb" -B 5
sorry to ask that here, but maybe I am missing something obvious?
Thanks for the help, I really appreciate all the work that was put into the zfsonlinux.
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-232218861, or mute the threadhttps://github.com/notifications/unsubscribe/ACX4uQqvLzbatZTuwO-rZH0CF-960t8jks5qVC0DgaJpZM4I-nnL.
@pwolny at least in random sampling to confirm, object id <=> what is reported as "inode number" in stat() and friends, so you could just parse the output of "stat [path to file]" or call one of the stat family of functions.
Or did you mean something else with your question?
Well that's a good point - should be easy to see in the code if it is by design.
Typos courtesy of my iPhone
On Jul 12, 2016, at 9:14 PM, Rich Ercolani <[email protected]notifications@github.com> wrote:
@pwolnyhttps://github.com/pwolny at least in random sampling to confirm, object id <=> what is reported as "inode number" in stat() and friends, so you could just parse the output of "stat [path to file]" or call one of the stat family of functions.
Or did you mean something else with your question?
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-232228087, or mute the threadhttps://github.com/notifications/unsubscribe/ACX4ue8dFlnm8NwKHSdjSFbhQnO48OwJks5qVDvkgaJpZM4I-nnL.
Yes! Thanks for the stat command insight. It was exactly what I was looking for.
@bprotopopov
Sorry, I will try to clarify.
Till now I have assumed that the file "wxmsw28ud_propgrid.pdb" had same underlying problem (hole_birth bug) as the files generated by the test script.
Recently I have used same version of zfs ( ZFS 0.6.5-329_g5c27b29 and SPL 0.6.5-63_g5ad98ad) to transfer three pools:
I have not applied any patches disabling the hole_birth optimization, at least to my knowledge.
The test pool generated with older (broken) version of zfs transfered incorrectly (using zfs 0.6.5-329_g5c27b29) so I have assumed that this optimization was not present.
Am I mistaken in that assumption?
I see. There were two bugs, Illumos 6513 and 6370, and the zfs version you are using to send/recv has the latter bug fixed. That is likely the reason for the positive test case you are asking about.
Boris.
From: pwolny [email protected]
Sent: Wednesday, July 13, 2016 4:17:49 PM
To: zfsonlinux/zfs
Cc: Boris Protopopov; Mention
Subject: Re: [zfsonlinux/zfs] Silently corrupted file in snapshots after send/receive (#4809)
Yes! Thanks for the stat command insight. It was exactly what I was looking for.
@bprotopopovhttps://github.com/bprotopopov
Sorry, I will try to clarify.
Till now I have assumed that the file "wxmsw28ud_propgrid.pdb" had same underlying problem (hole_birth bug) as the files generated by the test script.
Recently I have used same version of zfs ( ZFS 0.6.5-329_g5c27b29 and SPL 0.6.5-63_g5ad98ad) to transfer three pools:
I have not applied any patches disabling the hole_birth optimization, at least to my knowledge.
The test pool generated with older (broken) version of zfs transfered incorrectly (using zfs 0.6.5-329_g5c27b29) so I have assumed that this optimization was not present.
Am I mistaken in that assumption?
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-232473828, or mute the threadhttps://github.com/notifications/unsubscribe/ACX4uUZb5-T7Hxi5I7AI1lBI_wc_eUG9ks5qVUftgaJpZM4I-nnL.
My upgrade path for zfs was v0.6.5.3 -> 0.6.5.7 -> 0.6.5-329_g5c27b29.
I have created the test pool (which gives a broken checksum) and the original pool (containing "wxmsw28ud_propgrid.pdb" file) on v0.6.5.3. I kept them both through the upgrades on original hard drive set and I was trying to transfer them to a new disk. Creation of those pools and files happened on v0.6.5.3 so I assumed that data on disk was corrupted.
If I am not mistaken the Illumos 6513 and 6370 patch appeared sometime after zfs v0.6.5.5 so I would still expect to get an error checksum on "wxmsw28ud_propgrid.pdb" file when transferring using zfs 0.6.5-329_g5c27b29 ?
329 contains the fix for 6370
Typos courtesy of my iPhone
On Jul 13, 2016, at 6:34 PM, pwolny <[email protected]notifications@github.com> wrote:
My upgrade path for zfs was v0.6.5.3 -> 0.6.5.7 -> 0.6.5-329_g5c27b29.
I have created the test pool (which gives a broken checksum) and the original pool (containing "wxmsw28ud_propgrid.pdb" file) on v0.6.5.3. I kept them both through the upgrades on original hard drive set and I was trying to transfer them to a new disk. Creation of those pools and files happened on v0.6.5.3 so I assumed that data on disk was corrupted.
If I am not mistaken the Illumos 6513 and 6370 patch appeared sometime after zfs v0.6.5.5 so I would still expect to get an error checksum on "wxmsw28ud_propgrid.pdb" file when transferring using zfs 0.6.5-329_g5c27b29 ?
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-232506807, or mute the threadhttps://github.com/notifications/unsubscribe/ACX4ub983QxeqRK85tPzaibLhiJvLjjfks5qVWfzgaJpZM4I-nnL.
Ok, so that means that the 6370 bug can be repaired after the fact and it is not related to corrupted data on disk?
If so that would certainly explain the observed behavior and it would resolve this issue for me.
Thanks for all explanations, it really cleared it up for me.
hi, I am testing this issue with the following patches based on 0.6.5.7, there are still md5 mismatch issues.
I'm not sure if I missed any patches, if so, plz let me know.
I have modified above mentioned test script as following, and seems more easily to get this mismatch issue.
#!/bin/bash
# cat zfs_hole_transmit_test.sh, modified by neil for testing
function zfs_snap_send_recv()
{
zfs create -o recordsize=4k p1/hb
zfs set compression=lz4 p1/hb
truncate -s 1G /p1/hb/large_file
#ls /p1/hb/large_file -als
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=$((($RANDOM%100+3)*128)) seek=$((($RANDOM%100+1)*128)) 2> /dev/null
zfs snapshot p1/hb@$test_count"_1"
truncate -s $((($RANDOM%10+2)*128*4*1024)) /p1/hb/large_file
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=$(($RANDOM%100+128)) seek=$((3*128)) conv=notrunc 2> /dev/null
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=10 seek=$((($RANDOM%10+2)*128)) conv=notrunc 2> /dev/null
zfs snapshot p1/hb@$test_count"_2"
zfs send p1/hb@$test_count"_1" | zfs recv p1/backup
zfs send -i p1/hb@$test_count"_1" p1/hb@$test_count"_2" | zfs recv p1/backup
}
function checkmd5()
{
origin_md5=$(md5sum /p1/hb/large_file | awk '{print $1}')
backup_md5=$(md5sum /p1/backup/large_file | awk '{print $1}')
#echo $origin_md5 $backup_md5
if [ "$origin_md5" = "$backup_md5" ]; then
echo "MD5 match, $origin_md5, $backup_md5"
else
echo "MD5 mismatch, hole birth found, exit"
echo "$origin_md5 /p1/hb/large_file"
echo "$backup_md5 /p1/backup/large_file"
exit 1
fi
}
function zfs_clean()
{
zfs destroy p1/backup -r
zfs destroy p1/hb -r
}
zfs_clean
test_count=1
while true
do
echo "Test count:$test_count"
zfs_snap_send_recv
checkmd5
zfs_clean
test_count=$(($test_count+1))
#sleep 1
done
any more information, plz let me know.
Hi Neil
Please double check you test procedure, the module versions, etc.
Typos courtesy of my iPhone
On Jul 18, 2016, at 9:06 PM, Neil <[email protected]notifications@github.com> wrote:
hi, I am testing this issue with the following patches based on 0.6.5.7, there are still md5 mismatch issues.
I'm not sure if I missed any patches, if so, plz let me know.
I have modified above mentioned test script as following, and seems more easily to get this mismatch issue.
function zfs_snap_send_recv()
{
zfs create -o recordsize=4k p1/hb
zfs set compression=lz4 p1/hb
truncate -s 1G /p1/hb/large_file
#ls /p1/hb/large_file -als
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=$((($RANDOM%100+3)_128)) seek=$((($RANDOM%100+1)_128)) 2> /dev/null
zfs snapshot p1/hb@$test_count"_1"
truncate -s $((($RANDOM%10+2)_128_4_1024)) /p1/hb/large_file
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=$(($RANDOM%100+128)) seek=$((3_128)) conv=notrunc 2> /dev/null
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=10 seek=$((($RANDOM%10+2)*128)) conv=notrunc 2> /dev/null
zfs snapshot p1/hb@$test_count"_2"
zfs send p1/hb@$test_count"_1" | zfs recv p1/backup
zfs send -i p1/hb@$test_count"_1" p1/hb@$test_count"_2" | zfs recv p1/backup
}
function checkmd5()
{
origin_md5=$(md5sum /p1/hb/large_file | awk '{print $1}')
backup_md5=$(md5sum /p1/backup/large_file | awk '{print $1}')
#echo $origin_md5 $backup_md5
if [ "$origin_md5" = "$backup_md5" ]; then
echo "MD5 match, $origin_md5, $backup_md5"
else
echo "MD5 mismatch, hole birth found, exit"
echo "$origin_md5 /p1/hb/large_file"
echo "$backup_md5 /p1/backup/large_file"
exit 1
fi
}
function zfs_clean()
{
zfs destroy p1/backup -r
zfs destroy p1/hb -r
}
zfs_clean
test_count=1
while true
do
echo "Test count:$test_count"
zfs_snap_send_recv
checkmd5
zfs_clean
test_count=$(($test_count+1))
#sleep 1
done
any more information, plz let me know.
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-233503399, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACX4uUnfGh4GxbCinDunSwzPOuzl6enTks5qXCMOgaJpZM4I-nnL.
Hi, Boris
This test procedure repeated the same pool create / file create / snapshot / transfer cycle in the script.
besides the dataset setting in the script, create the pool with:
zpool create p1 sdb -f -o ashift=12 -o autoexpand=on
zpool set listsnapshots=on p1
And the testing environment & module version:
I failed to upload the corrupted files.
file corrupted at 0030 0000 - 003F FFFF, data in hb/large_file are all zeros, and in backup/large_files there are corrupted datas.
As I tested, at least 2 mismatch have been found. It will expose with more test.
If I have not missed patches, this issue should still exist and not completely fix yet.
Besides ignore hole_birth(https://github.com/zfsonlinux/zfs/pull/4833), I have no idea for this issue now.
I am glad to help fix this issue if any help needed.
Neil, the reason u asked about the procedure was that I have definitely run my original script, as well as the scripts in the earlier related thread, and I have not observed any issues if the build included the fix for 6513.
Please attach the git log of your repo where you built ZFS and the output of the commands
On your system where your tests have failed (I am assuming of course it is still running the same version of ZFS/SPL).
Boris.
Typos courtesy of my iPhone
On Jul 18, 2016, at 10:56 PM, Neil <[email protected]notifications@github.com> wrote:
Hi, Boris
This test procedure repeated the same pool create / file create / snapshot / transfer cycle in the script.
besides the dataset setting in the script, create the pool with:
zpool create p1 sdb -f -o ashift=12 -o autoexpand=on
zpool set listsnapshots=on p1
And the testing environment & module version:
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-233517209, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACX4uVfU6qtFl85F-GBjh2EJn3qdsT3aks5qXDzwgaJpZM4I-nnL.
Boris,
git log of spl and zfs, gitlog.zip
and versions
root@i-mruehtn9:~# cat /sys/module/zfs/version
0.6.5.7-1
root@i-mruehtn9:~# cat /sys/module/spl/version
0.6.5.7-1
Confirmed, with a 7 second delay between two writes
https://gist.github.com/Georgiy-Tugai/b7ae09767527d111ffb9f0a91b3e3bc4
Arch Linux x86-64, linux-lts 4.4.15-1, zfs-dkms 0.6.5.7-1, tested on a pool in a 3G file on tmpfs.
Strangely enough, sparse files in my production data, which was sent/received once (from an almost virgin pool to another virgin pool, both with all features enabled), seems intact as far as I can tell.
I'd appreciate recommendations for mitigating the effect of this bug; should I recreate the pool with hole_birth disabled?
I also tested with latest master branch, and it still can be reproduced within 0~6 hours.
master branch has already merged the patches mentioned above.
so we have to focus on this issue beside disable hole_birth.
In order to help you guys,
I need you to follow the build/test procedure I have described earlier in this thread. I can help you with going through the steps I have outlined, if you need help, but I need the info I have requested.
I need to be sure that you are running what you think you are running.
Boris.
From: Georgiy Tugai [email protected]
Sent: Thursday, July 21, 2016 11:04:33 AM
To: zfsonlinux/zfs
Cc: Boris Protopopov; Mention
Subject: Re: [zfsonlinux/zfs] Silently corrupted file in snapshots after send/receive (#4809)
Confirmed, with a 7 second delay between two writes
https://gist.github.com/Georgiy-Tugai/b7ae09767527d111ffb9f0a91b3e3bc4
Arch Linux x86-64, linux-lts 4.4.15-1, zfs-dkms 0.6.5.7-1, tested on a pool in a 3G file on tmpfs.
Strangely enough, sparse files in my production data, which was sent/received once (from an almost virgin pool to another virgin pool, both with all features enabled), seems intact as far as I can tell.
I'd appreciate recommendations for mitigating the effect of this bug; should I recreate the pool with hole_birth disabled?
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-234282491, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACX4ufav3jZFNmYvbgfmlDsMvIBMVmEaks5qX4qBgaJpZM4I-nnL.
Neil,
please provide the info I have requested earlier, for the test environment that uses the master branch.
Best regards,
Boris.
From: Neil [email protected]
Sent: Thursday, July 21, 2016 12:43:16 PM
To: zfsonlinux/zfs
Cc: Boris Protopopov; Mention
Subject: Re: [zfsonlinux/zfs] Silently corrupted file in snapshots after send/receive (#4809)
I also tested with latest master branch, and it still can be reproduced within 0~6 hours.
master branch has already merged the patches mentioned above.
so we have to focus on this issue beside disable hole_birth.
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-234312294, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACX4ua8RWEJQ3UIGjTR02iVm4d0cbILGks5qX6GkgaJpZM4I-nnL.
Hi, @pwolny,
could you please let me know if you still see any issues with missed holes for 0.6.5.7 amended with the fixes for illumos-6513 and illumos-6844 ?
I see @loyou and @Georgiy-Tugai suggest that there are still concerns, but I was not able to confirm so far that they are actually running the right code.
@bprotopopov , sorry for missing the latest information.
and versions
root@i-5mbc9uue:~/zfs_sf# cat /sys/module/zfs/version
0.6.5-329_g5c27b29
root@i-5mbc9uue:~/zfs_sf# cat /sys/module/spl/version
0.6.5-63_g5ad98ad
about the question:
I need you to follow the build/test procedure I have described earlier in this thread
I build the spl&zfs with ./autogen.sh && ./configure && make && make install
I test with the script mentioned above, which is modified to create filesystem, make a snapshot, create file, truncate with random size, modify with dd, make a snapshot and then send recv. And loop doing this.
@loyou
Can you please do the following - in your script, please log the random offsets and counts that you generate. Then when you see the failure, take those offsets and counts and plug them in your script.
My original script that was modified several times created a hole that covered an L1 region, and it used a combination of writes and truncates for that. Then that L1-size hole was partly overwritten, and the bug (zfs send does not know that the remaining L0 holes in the L1 ranges are _new_ and need to be transmitted) was triggered.
This bug was totally deterministic, and no randomness was needed. When you see the failure with certain offsets, counts, you will be able to reproduce it 100% of time, including the first try.
If you can reproduce those, then please publish the non-random script. If you can't I guess you might be dealing with something else.
Hi, @loyou
no that's good info, your test procedure is OK.
Here's what would be useful: if you logged the offsets/sizes, truncate boundaries in your script, and if you changed the script to stop and not cleanup on failure (which it looks like it already does).
Then you could use the
#zdb -ddddd pool/fs object
as discussed above to dump the block pointers.
Also, for completeness, can you run the unmodified original script that I used (shown above as well) to see if you can reproduce the 6513 issue in a very simple way ?
zfs create -o recordsize=4k tpool/test_fs
zfs set compression=on tpool/test_fs
truncate -s 1G /tpool/test_fs/large_file
dd if=/dev/urandom of=/tpool/test_fs/large_file bs=4k count=$((3*128)) seek=$((1*128)) oflag=direct
zfs snapshot tpool/test_fs@s1
truncate -s $((2*128*4*1024)) /tpool/test_fs/large_file
dd if=/dev/urandom of=/tpool/test_fs/large_file bs=4k count=128 seek=$((3*128)) conv=notrunc
dd if=/dev/urandom of=/tpool/test_fs/large_file bs=4k count=10 seek=$((2*128)) conv=notrunc
zfs snapshot tpool/test_fs@s2
zfs send tpool/test_fs@s1 | zfs recv tpool/test_fs_copy
zfs send -i tpool/test_fs@s1 tpool/test_fs@s2 | zfs recv tpool/test_fs_copy
md5sum /tpool/test_fs/large_file
md5sum /tpool/test_fs_copy/large_file
that would be very helpful.
@loyou
ran your script on my centos 6.7 vm for 7 hours, went through 1110 iterations or so, no failures.
I had the zfs/spl master
[root@centos-6 zfs-github]# cat /sys/module/{zfs,spl}/version
0.6.5-330_g590c9a0
0.6.5-64_gd2f97b2
just one commit over yours in each repo (seemingly unrelated).
@bprotopopov
I have run your script and it cannot reproduce the 6513 issue.
I followed your suggestion and logged the offset&size in the script, and then test with the non-random script. It is not 100% issue, and sometimes it does match.
During the test, I found that with default build "./autogen.sh && ./configure && make && make install", which will not create /etc/zfs/zpool.cache, in this case, it can reproduce mismatch with high rate. However if /etc/zfs/zpool.cache is created, it is hard to reproduce(but still can). I am not sure if it make sense.
Because of no zpool.cache, zdb -ddddd cannot be launched.
I had made vm server and which has kept current issue, I will send an individual email share the ip&user&password for online checking.
Besides, I will keep on checking it either. Thank u!
@loyou, well I guess I am not sure then that it is 100% the same issue, if it is not deterministic.
If you import the pool, you should be able to run zdb regardless, I believe.
OK, you can contact me offline at [email protected] regarding the test VM and will 'script' the term session for the future reference. I will try to get to this over the weekend but cannot guarantee.
@bprotopopov , thanks.
this is one of the reproduce case:
zfs create -o recordsize=4k p1/hb
truncate -s 1G /p1/hb/large_file
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=11264 seek=1152
zfs snapshot p1/hb@2_1
truncate -s 4194304 /p1/hb/large_file
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=152 seek=384 conv=notrunc
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=10 seek=1408 conv=notrunc
zfs snapshot p1/hb@2_2
zfs send p1/hb@2_1 | zfs recv p1/backup
zfs send -i p1/hb@2_1 p1/hb@2_2 | zfs recv p1/backup
MD5 mismatch, hole birth found, exit
d0226974fcab038c843250210b96d401 /p1/hb/large_file
f808cdf1979f0eabf19835ea3e09e8e3 /p1/backup/large_file
also I attached the bp dump of p1/hb&p1/backup
p1_hb.zdb.ddddd.txt
p1_backup.zdb.ddddd.txt
Hi, @loyou,
just ran your script. It works fine on my system. The issue you are seeing on your system, based on the attached zdb output files, does seem like a 6513 problem:
the two L1 ranges at offsets 480000 and 500000 are supposed to be holes with non-zero birth epoch in your source and destination filesystems, yet they are filled with data in your backup, and it looks like they are considered 0-birth epoch holes in your source filesystem.
You can use
zdb -R p1 0:237e9c00:200:di
to dump the L2 metadata block of your source dataset and you will see for sure what the block pointers are.
For the correct L2 on my system, it looks like:
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
DVA[0]=<3:2f9ada00:1600> DVA[1]=<0:f1f3e00:1600> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/1600P birth=111432L/111432P fill=128 cksum=1e14
b0d63f2:54c26d5ac1e69:9ba0888976a4e85:59c1e2dcbc65f21a
DVA[0]=<2:e13e200:600> DVA[1]=<3:2fc86200:600> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/600P birth=111432L/111432P fill=24 cksum=69551c75
7e:5c6ed46d7d58:2e7671432ae939:11145422d7e7f630
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L1 ZFS plain file] size=4000L birth=111432L
HOLE [L1 ZFS plain file] size=4000L birth=111432L
DVA[0]=<0:f1d3a00:400> DVA[1]=<1:7f7e600:400> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/400P birth=111432L/111432P fill=10 cksum=34e9e2163
2:22aef29e5e10:c2175404e0c4b:2ff61fbf03444b6
...
with the 480000 and 500000 ranges showing
...
HOLE [L1 ZFS plain file] size=4000L birth=111432L
HOLE [L1 ZFS plain file] size=4000L birth=111432L
...
showing holes with non-zero birth epochs (every line is for L1 range of size 80000 hex).
From: Neil [email protected]
Sent: Monday, July 25, 2016 12:22:10 AM
To: zfsonlinux/zfs
Cc: Boris Protopopov; Mention
Subject: Re: [zfsonlinux/zfs] Silently corrupted file in snapshots after send/receive (#4809)
@bprotopopovhttps://github.com/bprotopopov , thanks.
this is one of the reproduce case:
zfs create -o recordsize=4k p1/hb
truncate -s 1G /p1/hb/large_file
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=11264 seek=1152
zfs snapshot p1/hb@2_1
truncate -s 4194304 /p1/hb/large_file
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=152 seek=384 conv=notrunc
dd if=/dev/urandom of=/p1/hb/large_file bs=4k count=10 seek=1408 conv=notrunc
zfs snapshot p1/hb@2_2
zfs send p1/hb@2_1 | zfs recv p1/backup
zfs send -i p1/hb@2_1 p1/hb@2_2 | zfs recv p1/backup
MD5 mismatch, hole birth found, exit
d0226974fcab038c843250210b96d401 /p1/hb/large_file
f808cdf1979f0eabf19835ea3e09e8e3 /p1/backup/large_file
also I attached the bp dump of p1/hb&p1/backup
p1_hb.zdb.ddddd.txthttps://github.com/zfsonlinux/zfs/files/380763/p1_hb.zdb.ddddd.txt
p1_backup.zdb.ddddd.txthttps://github.com/zfsonlinux/zfs/files/380765/p1_backup.zdb.ddddd.txt
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/zfsonlinux/zfs/issues/4809#issuecomment-234833274, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACX4uSaYQ9FJXTq-MKohYn9EPyMiQnQJks5qZDnygaJpZM4I-nnL.
@bprotopopov
we can see the metadata of p1/{hb,backup}/large_file, and p1/backup/large_file get 2 more L1 indirect blocks than p1/hb/large_file, which is the corruption region we get.
L2 metadata of p1/hb/large_file, zdb -R p1 0:237e9c00:200:di
p1_hb_L2_0:237e9c00:200:di.txt
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
DVA[0]=<0:237c5c00:1600> DVA[1]=<0:4e0c6fc00:1600> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/1600P birth=111640L/111640P fill=128 cksum=1f43e582eff:5d758297d1932:ac7dbe127589265:cc40479b7a6c1a02
DVA[0]=<0:237df200:600> DVA[1]=<0:4e0c71200:600> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/600P birth=111640L/111640P fill=24 cksum=65f68de822:617218e5a156:33dad79632620d:13e30014e5ff9c77
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
DVA[0]=<0:237e9800:400> DVA[1]=<0:4e0c71800:400> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/400P birth=111640L/111640P fill=10 cksum=3c70471c1a:2ab5b049d594:fc66313f88a2b:40d11c9085f6614
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
...
and L2 metadata of p1/backup/large_file, zdb -R p1 0:2655ec00:200:di
p1_backup_L2_0:2655ec00:200:di.txt
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
DVA[0]=<0:26545000:1600> DVA[1]=<0:4e0d28000:1600> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/1600P birth=111651L/111651P fill=128 cksum=1ea63edcd43:5a811c96f8444:a6539e50ad7e670:4bb8a78beaab38b4
DVA[0]=<0:2655e600:600> DVA[1]=<0:4e0d29600:600> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/600P birth=111651L/111651P fill=24 cksum=65446fb211:6076ba7f1844:3324c8e0e621bb:1389c7c570b9cccb
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
HOLE [L0 unallocated] size=200L birth=0L
DVA[0]=<0:2388e400:1600> DVA[1]=<0:4e0c8f400:1600> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/1600P birth=111646L/111646P fill=128 cksum=1db560423dd:59cd5dadfa628:a7c689bdf8d691d:8db5c5a36c3f22e9
DVA[0]=<0:2390fa00:1600> DVA[1]=<0:4e0c90a00:1600> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/1600P birth=111646L/111646P fill=128 cksum=1fece6089a8:5dbded8753784:aae5c4b54dd159f:8b6929104e0662ea
DVA[0]=<0:264c4c00:400> DVA[1]=<0:4e0d27c00:400> [L1 ZFS plain file] fletcher4 lz4 LE contiguous unique double size=4000L/400P birth=111651L/111651P fill=10 cksum=3a6257cc44:288124e7f014:ec3d8e9dd5e64:3c21fed6a53e702
HOLE [L1 ZFS plain file] size=4000L birth=111651L
HOLE [L1 ZFS plain file] size=4000L birth=111651L
HOLE [L1 ZFS plain file] size=4000L birth=111651L
HOLE [L1 ZFS plain file] size=4000L birth=111651L
...
We have a zpool which I think is affected by this issue and I'm trying to work out the correct commits to build to resolve this. I am working on top of the zfs-0.6.5.7 tag:
Illumos 6844 - dnode_next_offset can detect fictional holes - 463a8cfe2b293934edd2ee79115b20c4598353d6
OpenZFS 6513 - partially filled holes lose birth time - bc77ba73fec82d37c0b57949ec29edd9aa207677
Skip ctldir znode in zfs_rezget to fix snapdir issues - cbecb4fb228bbfedec02eb0a172c681b4338d08f
Want tunable to ignore hole_birth - 31b86019fd1c2a75548d0a3a8cfd0a29023d55b2
root# cat /sys/module/zfs/parameters/ignore_hole_birth
1
Originally we saw this issue because the incremental received stream showed the content diverging between the zvol and the file. Doing some testing now though it seems that the file gets corrupted as soon as we take the snapshot in the source pool. The WriteDiffs.py tool is something we developed to only write changed blocks in a file so we got cow benefits.
root# WriteDiffs.py -c -s /dev/zvol/ztank/reference/xvda2@ref4000 /ztank/reference/config.ext4
Source: /dev/zvol/ztank/reference/xvda2@ref4000; sha256: b1c4d74fe8f18fbdb0f58074d043f9ba818e26a67ec64a835057e763ac099f13
Destination: /ztank/reference/config.ext4; sha256: b1c4d74fe8f18fbdb0f58074d043f9ba818e26a67ec64a835057e763ac099f13
Start Time: 1470734255.66
End Time: 1470734256.41
Run Time: 0.75s
Total Compared Blocks: 31232
Matched Blocks: 31232; Unmatched Blocks: 0
Compared 41720.54 blocks/s; 162.97 Mbytes/s
# This execution of WriteDiffs.py did not make any change to the copy of the file at /ztank/reference/config.ext4
root# sha256sum /dev/zvol/ztank/reference/xvda2@ref4000 /ztank/reference/config.ext4
b1c4d74fe8f18fbdb0f58074d043f9ba818e26a67ec64a835057e763ac099f13 /dev/zvol/ztank/reference/xvda2@ref4000
b1c4d74fe8f18fbdb0f58074d043f9ba818e26a67ec64a835057e763ac099f13 /ztank/reference/config.ext4
root# zfs snapshot -r hapseed/reference/filesystems/imageslv@james
root# sha256sum /dev/zvol/ztank/reference/xvda2@ref4000 /ztank/reference/config.ext4 /ztank/reference/.zfs/snapshot/ref4000/config.ext4 /ztank/reference/.zfs/snapshot/james/config.ext4
b1c4d74fe8f18fbdb0f58074d043f9ba818e26a67ec64a835057e763ac099f13 /dev/zvol/ztank/reference/xvda2@ref4000
b1c4d74fe8f18fbdb0f58074d043f9ba818e26a67ec64a835057e763ac099f13 /ztank/reference/config.ext4
4b0906ecf1d4b4356acc7b20a05ed4744f38eede72fc1f96fe138932e37b3422 /ztank/reference/.zfs/snapshot/ref4000/config.ext4
4b0906ecf1d4b4356acc7b20a05ed4744f38eede72fc1f96fe138932e37b3422 /ztank/reference/.zfs/snapshot/james/config.ext4
When we send/receive the snapshot both the @ref4000 snapshot and the file in the filesystem are corrupted to the snapshot checksum
root@other-host# sha256sum /dev/zvol/ztank/reference/xvda2@ref4000 /ztank/reference/config.ext4 /ztank/reference/.zfs/snapshot/ref4000/config.ext4
b1c4d74fe8f18fbdb0f58074d043f9ba818e26a67ec64a835057e763ac099f13 /dev/zvol/ztank/reference/xvda2@ref4000
4b0906ecf1d4b4356acc7b20a05ed4744f38eede72fc1f96fe138932e37b3422 /ztank/reference/config.ext4
4b0906ecf1d4b4356acc7b20a05ed4744f38eede72fc1f96fe138932e37b3422 /ztank/reference/.zfs/snapshot/ref4000/config.ext4
@JKDingwall Fascinating.
Unless I'm misrecalling how this works, you shouldn't be able to see this bug without using differential send-recv or parsing ZDB output, because the only thing in ZFS that cares about the incorrect data is the calculation of whether to send holes for differential send (and any data corruption only "happens" on the receiver of the snapshot, because it didn't send some holes).
So, if you're seeing corruption happening before you do any send|recv, I don't _think_ this is the bug in question?
I guess it depends on what WriteDiffs.py is doing.
This zpool is on a master system for our CI system which produces an incremental patch stream each night for that build against our previous release. We started seeing problems in the incremental stream on receive last week and because of how we update the config.ext4 image assumed it was related to this bug. It was only today I looked at the checksums on the master zpool and saw that the file is corrupted in the snapshot before sending. What would the interesting tests or details about the system be to confirm if this is a different issue?
@JKDingwall The biggest thing is that we need to know how WriteDiffs.py works. This bug is in a very specific set of code, and if you're encountering this issue by running userland scripts, there's only a couple ways that could be done.
referencing https://github.com/openzfs/openzfs/pull/173 7176 Yet another hole birth issue [OpenZFS]
https://github.com/zfsonlinux/zfs/pull/4950 OpenZFS 7176 Yet another hole birth issue [PR4950]
@kernelOfTruth, I fixed the build error of your pull request, but I can still reproduce with my script.
@pcd1193182 I have created a gist with the WriteDiffs.py code: https://gist.github.com/JKDingwall/7f7200a99f52fd3c61b734fbde31edee/49b86a26cb3b9cfd1819d7958e7c8808f6fd991e
On our system after a reboot the checksum of the file in the filesystem also corrupted to match the snapshot. I believe however that there was a bug present in the code where the mmap for the output file was not flushed before it was closed. The HEAD of the gist contains the latest version which does this and so far we have not seen the problem again. It would be useful to know if you think that this could have been the cause. If so we have been lucky to have not been caught out before.
@JKDingwall so you are not longer able to reproduce the problem?. Can you confirm if the patch "Want tunable to ignore hole_birth - 31b8601" with the tunnable set to 1 was the one that finally stopped the issue from happening?
@clopez - I can still reproduce the problem. Patches as given above, ignore_hole_birth=1.
Last night I got:
zvol sha256sum: db71ec877acc4f2abc6a9e01781df46e3d1e1416cc2561ccea51d25f348b65ba
file after WriteDiffs.py: db71ec877acc4f2abc6a9e01781df46e3d1e1416cc2561ccea51d25f348b65ba
snapshot of file after WriteDiffs.py: ebc8d6bd78578640839b0cc4f78695bd40eebce2582d28eb32979a712fe19047
I think if I reboot I will see the file corrupt to the snapshot checksum...
zvol: db71ec877acc4f2abc6a9e01781df46e3d1e1416cc2561ccea51d25f348b65ba
file: ebc8d6bd78578640839b0cc4f78695bd40eebce2582d28eb32979a712fe19047
file snapshot: ebc8d6bd78578640839b0cc4f78695bd40eebce2582d28eb32979a712fe19047
Could it be a cache consistency problem with holes?
@JKDingwall
you wrote: "...It was only today I looked at the checksums on the master zpool and saw that the file is corrupted in the snapshot before sending. What would the interesting tests or details about the system be to confirm if this is a different issue?"
If you modify the pool on your master system using zfs send/recv (I mean if you _receive_ into that master pool), then this might be an issue with missed holes. If you do not _receive_ into the master pool and still see some corruption, then it is a different issue.
There are ways of examining the metadata (block pointers) of the suspect files using zdb, as discussed in this thread, that let you see if there were holes missed. But the pre-requisite for this type of bugs to occur is that the dataset is _modified_ with zfs receive.
@bprotopopov - there is no zfs receive involved so I will try to gather further information with a reproducible test case and then open a new issue.
I do not know if it belongs here. I have the script used to test from above. And find the result very strange.
ZFSPOOL='DS1'
zfs destroy $ZFSPOOL/backup -r
zfs destroy $ZFSPOOL/hb -r
zfs create -o recordsize=4k $ZFSPOOL/hb
truncate -s 1G /$ZFSPOOL/hb/large_file
dd if=/dev/urandom of=/$ZFSPOOL/hb/large_file bs=4k count=11264 seek=1152
zfs snapshot $ZFSPOOL/hb@2_1
truncate -s 4194304 /$ZFSPOOL/hb/large_file
dd if=/dev/urandom of=/$ZFSPOOL/hb/large_file bs=4k count=152 seek=384 conv=notrunc
sleep 5 # If sleep >= 5 seconds is the checksum differently.
dd if=/dev/urandom of=/$ZFSPOOL/hb/large_file bs=4k count=10 seek=1408 conv=notrunc
zfs snapshot $ZFSPOOL/hb@2_2
zfs send $ZFSPOOL/hb@2_1 | zfs recv $ZFSPOOL/backup
zfs send -i $ZFSPOOL/hb@2_1 $ZFSPOOL/hb@2_2 | zfs recv $ZFSPOOL/backup
md5sum /$ZFSPOOL/hb/large_file /$ZFSPOOL/backup/large_file
I have called it on two different systems, more than 50 times. Sleep <5 seconds everything OK> = 5 seconds error.
System 1 Debian 8 ZFS 0.6.5.7-8-jessie
System 2 Ubuntu 16.04 ZFS 0.6.5.6-0ubuntu10
@datiscum Unless this is going to become an overarching bug for all hole_birth bugs, this isn't a case of the bug in illumos 6513, but instead of illumos 7176, which has an open PR for ZoL of #4981.
So in short, I think your problem isn't the problem being discussed in this bug, but is a known issue.
Hey @datiscum, while writing test scripts for all the hole_birth bugs found, I found that indeed, your reported bug isn't any of the ones currently in play! So I filed #5030 about it. You might want to subscribe to that to see when it's fixed.
@JKDingwall Presuming your issue hasn't been magically fixed in the interim, could you open a new issue about it, since I don't think your bug is the same bug this thread is about?
Closing. The whole birth optimization has been disabled by default in the 0.6.5.8 release, 3a8e13688b4ee8c1799edb6484943eae5ea90dd2. Work to resolve the various underlying issues is on going in master.
@behlendorf can we also disable it for 0.7.0 and master ?
even though following the discussion, I just noticed that I had forgotten to disable that setting at module load time,
I didn't use zfs send for the last few weeks but intend to do so again in the near future,
for e.g. Gentoo or other bleeding edge users who aren't aware of details this could cause some headaches if there are still issues dormant in the code
Thanks
Yes, that's a good idea. We'll definitely disable this in the final 0.7.0 when tagging if the various issues aren't all wrapped up by then. Disabling it in master also makes sense now we've started putting out release candidates for wider testing. Opened PR #5099 to change the default.
Can someone watching this bug please review patch in #5099 which changes the default value.