Zfs: 100% CPU load from arc_prune

Created on 7 Feb 2020  路  35Comments  路  Source: openzfs/zfs

System information


Type | Version/Name
--- | ---
Distribution Name | centos
Distribution Version | CentOS release 6.10 (Final)
Linux Kernel | Linux hereland.nmrfam.wisc.edu 2.6.32-754.25.1.el6.x86_64 #1 SMP Mon Dec 23 15:19:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Architecture | AMD 64
ZFS Version | 0.8.3-1
SPL Version | 0.8.3-1

Describe the problem you're observing

Moving 14TB of external data into a zfs dataset, after a couple of days the CPU is at 99% system load. Clients time out trying to mount the dataset (exported via sharenfs).

Describe how to reproduce the problem

It's the first one for me, no idea if it's reproducible.

Include any warning/errors/backtraces from the system logs

top - 09:49:18 up 3 days, 21:38,  3 users,  load average: 48.27, 47.56, 47.73
Tasks: 653 total,  31 running, 622 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 99.0%sy,  0.0%ni,  1.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32993392k total, 28381572k used,  4611820k free,    66360k buffers
Swap: 65535996k total,    57400k used, 65478596k free,  1500656k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20905 root      20   0     0    0    0 R 98.7  0.0 167:51.80 arc_prune
 7821 root      20   0     0    0    0 R 98.4  0.0 212:08.44 arc_prune
 8292 root      20   0     0    0    0 R 98.4  0.0  70:27.30 arc_prune
20589 root      20   0     0    0    0 R 98.4  0.0 168:11.97 arc_prune
22506 root      20   0     0    0    0 R 98.4  0.0 166:39.25 arc_prune
 1772 root      20   0     0    0    0 R 98.1  0.0 475:47.11 arc_prune
 8502 root      20   0     0    0    0 R 98.1  0.0 212:09.17 arc_prune
18799 root      20   0     0    0    0 R 98.1  0.0 169:55.35 arc_prune
19397 root      20   0     0    0    0 R 98.1  0.0 169:08.63 arc_prune
20312 root      20   0     0    0    0 R 98.1  0.0 168:14.17 arc_prune
23623 root      20   0     0    0    0 R 98.1  0.0 164:48.99 arc_prune
28861 root      20   0     0    0    0 R 98.1  0.0 137:44.49 arc_prune
21294 root      20   0     0    0    0 R 97.8  0.0 167:55.44 arc_prune
19192 root      20   0     0    0    0 R 96.5  0.0   4:46.02 arc_prune
23087 root      20   0     0    0    0 R 95.8  0.0 165:28.73 arc_prune
19802 root      20   0     0    0    0 R 94.9  0.0 168:58.81 arc_prune
 1773 root      39  19     0    0    0 R  6.1  0.0  84:38.77 zthr_procedure
19929 root      20   0 15432 1724  944 R  1.9  0.0   0:00.50 top
Memory Management Defect

All 35 comments

More info:

# zpool status
  pool: backups
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:10:52 with 0 errors on Sat Feb  1 00:10:54 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        backups                     ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            scsi-35000c500860d0c4b  ONLINE       0     0     0
            scsi-35000c500860d1543  ONLINE       0     0     0
            scsi-35000c500860f9437  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0 days 21:22:47 with 0 errors on Sat Feb  1 21:22:49 2020
config:

        NAME                              STATE     READ WRITE CKSUM
        tank                              ONLINE       0     0     0
          raidz1-0                        ONLINE       0     0     0
            scsi-35000c500860ba403        ONLINE       0     0     0
            scsi-35000c500860bbb4f        ONLINE       0     0     0
            scsi-35000c500860bbb5f        ONLINE       0     0     0
          raidz1-2                        ONLINE       0     0     0
            scsi-35000c500860bcb77        ONLINE       0     0     0
            scsi-35000c500860cf9d7        ONLINE       0     0     0
            scsi-35000c500860d0b67        ONLINE       0     0     0
          raidz1-3                        ONLINE       0     0     0
            scsi-35000c500860f96db        ONLINE       0     0     0
            scsi-35000c500860fc85f        ONLINE       0     0     0
            scsi-35000c500860fd93b        ONLINE       0     0     0
        logs
          mirror-1                        ONLINE       0     0     0
            scsi-35000c500301bf47f-part1  ONLINE       0     0     0
            scsi-35000c500301bf69f-part1  ONLINE       0     0     0
        cache
          scsi-35000c500301bf6c3-part1    ONLINE       0     0     0

errors: No known data errors

# zpool iostat
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
backups      402G  21.4T      1      0  6.71K   204K
tank        31.9T  33.5T    783    217  33.8M  26.7M
----------  -----  -----  -----  -----  -----  -----

ARC_MAX is not set, it's left at the default. Cache SSD is 160GB. All disks are dual-ported SAS drives in an external shelf.

Even more info:

...
Feb  7 04:55:40 hereland zed[6190]: Invoking "all-syslog.sh" eid=180 pid=16962
Feb  7 04:55:40 hereland zed: eid=180 class=history_event pool_guid=0x68C093597CA7900F
Feb  7 04:55:40 hereland zed[6190]: Finished "all-syslog.sh" eid=180 pid=16962 exit=0
Feb  7 04:55:41 hereland zed[6190]: Invoking "all-syslog.sh" eid=181 pid=17099
Feb  7 04:55:41 hereland zed: eid=181 class=history_event pool_guid=0x68C093597CA7900F
Feb  7 04:55:41 hereland zed[6190]: Finished "all-syslog.sh" eid=181 pid=17099 exit=0
Feb  7 06:14:42 hereland kernel: INFO: task nfsd:6109 blocked for more than 120 seconds.
...

There's nothing out of the ordinary in syslog until the "hung task" nfsd.

Does it look like I'm getting hit by "don't make the cache disk bigger than 5 x 1/2 RAM" problem? I thought that was no longer a thing.

@dmaziuk that's a possibility. Can you post the contents of the /proc/spl/kstat/zfs/arcstats file, it should give us a better idea why the system is so aggressively trying to reclaim from the ARC.

# cat /proc/spl/kstat/zfs/arcstats
12 1 0x01 98 26656 14203091063 342692306280336
name                            type data
hits                            4    1961861302
misses                          4    214595515
demand_data_hits                4    535849124
demand_data_misses              4    21425564
demand_metadata_hits            4    1310503883
demand_metadata_misses          4    29241451
prefetch_data_hits              4    1327
prefetch_data_misses            4    146730344
prefetch_metadata_hits          4    115506968
prefetch_metadata_misses        4    17198156
mru_hits                        4    760224963
mru_ghost_hits                  4    1685841
mfu_hits                        4    1094334124
mfu_ghost_hits                  4    1919548
deleted                         4    294724177
mutex_miss                      4    24519247
access_skip                     4    701
evict_skip                      4    554617006
evict_not_enough                4    75691673
evict_l2_cached                 4    8277499373056
evict_l2_eligible               4    14072033056256
evict_l2_ineligible             4    9966615764992
evict_l2_skip                   4    4031567
hash_elements                   4    2149096
hash_elements_max               4    6808116
hash_collisions                 4    222109741
hash_chains                     4    394313
hash_chain_max                  4    11
p                               4    15524174769
c                               4    16823934976
c_min                           4    1051495936
c_max                           4    16823934976
size                            4    16951797592
compressed_size                 4    1377493504
uncompressed_size               4    4542341632
overhead_size                   4    4540340224
hdr_size                        4    101469040
data_size                       4    742869504
metadata_size                   4    5174964224
dbuf_size                       4    2256648576
dnode_size                      4    6258464128
bonus_size                      4    2234942720
anon_size                       4    705024
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    1051861504
mru_evictable_data              4    0
mru_evictable_metadata          4    8192
mru_ghost_size                  4    0
mru_ghost_evictable_data        4    0
mru_ghost_evictable_metadata    4    0
mfu_size                        4    4865267200
mfu_evictable_data              4    0
mfu_evictable_metadata          4    0
mfu_ghost_size                  4    0
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    0
l2_hits                         4    3173912
l2_misses                       4    211421480
l2_feeds                        4    786569
l2_rw_clash                     4    16
l2_read_bytes                   4    11567827968
l2_write_bytes                  4    4970878013440
l2_writes_sent                  4    689715
l2_writes_done                  4    689715
l2_writes_error                 4    0
l2_writes_lock_retry            4    1425
l2_evict_lock_retry             4    1161
l2_evict_reading                4    37
l2_evict_l1cached               4    1921025
l2_free_on_write                4    385458
l2_abort_lowmem                 4    119
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    182019350528
l2_asize                        4    140228637696
l2_hdr_size                     4    182439072
memory_throttle_count           4    0
memory_direct_count             4    143
memory_indirect_count           4    1174580
memory_all_bytes                4    33785233408
memory_free_bytes               4    7054409728
memory_available_bytes          3    6528663552
arc_no_grow                     4    0
arc_tempreserve                 4    9216
arc_loaned_bytes                4    0
arc_prune                       4    13876726
arc_meta_used                   4    16208928088
arc_meta_limit                  4    12617951232
arc_dnode_limit                 4    1261795123
arc_meta_max                    4    16893738168
arc_meta_min                    4    16777216
async_upgrade_sync              4    1740388
demand_hit_predictive_prefetch  4    67124685
demand_hit_prescient_prefetch   4    83590118
arc_need_free                   4    0
arc_sys_free                    4    525747968
arc_raw_size                    4    0

System load is down to 95-ish% now BTW (but nfsd is still hosed)

According to the arcstats the ARC is entirely filled with metadata and is over the target (arc_meta_used > arc_meta_limit). The arc_prune threads are trying to to evict some of this metadata to get down to the 75% target value.

arc_meta_used                   4    16208928088
arc_meta_limit                  4    12617951232

They appear to be unable to make any progress on this which is why they're spinning. If I recall correctly, this was caused by the file handles cached by the nfsd holding a reference which prevents the ARC form freeing the buffers. I believe the nfsd behavior here is a little different in newer kernels and better behaved.

You should be able to stop the spinning by setting the limit to 100%. You can make this the default with the zfs_arc_meta_limit_percent=100 module option. But you'll need to reload the ZFS modules for that to take effect. To change the limit immediately, you can set zfs_arc_meta_limit to the arc_meta_max.

echo 16893738168 >/sys/module/zfs/parameters/zfs_arc_meta_limit

echo 16893738168 >/sys/module/zfs/parameters/zfs_arc_meta_limit

Done, arc_meta_used is going down now, but very slowly, with occasional dips in system load down to 80%-ish.

@behlendorf if you need anything else from this system, say so because my users are crying, I have to bounce it.

Could it be possible to make it not use all available cores? -- it's a 16-core system that has 16 instances of arc_prune. I wonder if it should be limited to e.g. half the cores, like arc is limited to half the RAM by default.

Feel free to bounce the system, the arcstats are enough to confirm the issue. It definitely can, and should, be improved. We have some ideas about how to improve things.

Is this from RAM ARC or cache SSD ARC? I.e. if it's the latter, can I get around it by disconnecting cache device? -- Temporarily, while moving a zillion files around.

It's the RAM ARC, disconnecting the cache device won't help. Setting zfs_arc_meta_limit_percent=100 should avoid the spinning.

OK, thank you. I did set in /etc/modprobe.d/zfs.conf though I really don't expect to hit this again. Unless another group moves in with 8 years worth of data that they think they want on our "big" file server.

Thanks again

To change the limit immediately, you can set zfs_arc_meta_limit to the arc_meta_max

@behlendorf thanks for providing the hint, it made the spinning go away for me as well!
below were the parameters/statistics on my system when spinning occured (happened during send/receive to/from local pools)

how do you plan to solve this by default in the future without needing user intervention?

````
root@zfs:~# systool -vm spl
Module = "spl"

Attributes:
coresize = "106496"
initsize = "0"
initstate = "live"
refcnt = "5"
srcversion = "09B6CA3EE9E5F5E76F82D5C"
taint = "OE"
uevent =
version = "0.8.3-1~rlaager1.18.04.1"

Parameters:
spl_hostid = "0"
spl_hostid_path = "/etc/hostid"
spl_kmem_alloc_max = "1048576"
spl_kmem_alloc_warn = "65536"
spl_kmem_cache_expire= "2"
spl_kmem_cache_kmem_limit= "2048"
spl_kmem_cache_kmem_threads= "4"
spl_kmem_cache_magazine_size= "0"
spl_kmem_cache_max_size= "32"
spl_kmem_cache_obj_per_slab_min= "1"
spl_kmem_cache_obj_per_slab= "8"
spl_kmem_cache_reclaim= "0"
spl_kmem_cache_slab_limit= "16384"
spl_max_show_tasks = "512"
spl_panic_halt = "0"
spl_schedule_hrtimeout_slack_us= "0"
spl_taskq_kick = "0"
spl_taskq_thread_bind= "0"
spl_taskq_thread_dynamic= "1"
spl_taskq_thread_priority= "1"
spl_taskq_thread_sequential= "4"

Sections:
.altinstr_replacement= "0xffffffffc049b795"
.altinstructions = "0xffffffffc049f082"
.bss = "0xffffffffc04a0ec0"
.data = "0xffffffffc04a0000"
.exit.text = "0xffffffffc049b762"
.gnu.linkonce.this_module= "0xffffffffc04a0b80"
.init.text = "0xffffffffc04ac000"
.note.gnu.build-id = "0xffffffffc049c000"
.parainstructions = "0xffffffffc049d358"
.rodata = "0xffffffffc049c820"
.rodata.str1.1 = "0xffffffffc049cdc6"
.rodata.str1.8 = "0xffffffffc049e918"
.smp_locks = "0xffffffffc049d2c4"
.strtab = "0xffffffffc04b2160"
.symtab = "0xffffffffc04ad000"
.text = "0xffffffffc0491000"
__bug_table = "0xffffffffc04a0b60"
__ksymtab = "0xffffffffc049c030"
__ksymtab_strings = "0xffffffffc049d7d4"
__mcount_loc = "0xffffffffc049e238"
__param = "0xffffffffc049def0"

root@zfs:~# systool -vm zfs
Module = "zfs"

Attributes:
coresize = "3854336"
initsize = "0"
initstate = "live"
refcnt = "38"
srcversion = "C6177AA5049CC30B672B1CA"
taint = "POE"
uevent =
version = "0.8.3-1~rlaager1.18.04.1"

Parameters:
dbuf_cache_hiwater_pct= "10"
dbuf_cache_lowater_pct= "10"
dbuf_cache_max_bytes= "33554432"
dbuf_cache_shift = "5"
dbuf_metadata_cache_max_bytes= "16777216"
dbuf_metadata_cache_shift= "6"
dmu_object_alloc_chunk_shift= "7"
dmu_prefetch_max = "134217728"
ignore_hole_birth = "1"
l2arc_feed_again = "1"
l2arc_feed_min_ms = "200"
l2arc_feed_secs = "1"
l2arc_headroom = "2"
l2arc_headroom_boost= "200"
l2arc_noprefetch = "1"
l2arc_norw = "0"
l2arc_write_boost = "8388608"
l2arc_write_max = "8388608"
metaslab_aliquot = "524288"
metaslab_bias_enabled= "1"
metaslab_debug_load = "0"
metaslab_debug_unload= "0"
metaslab_df_max_search= "16777216"
metaslab_df_use_largest_segment= "0"
metaslab_force_ganging= "16777217"
metaslab_fragmentation_factor_enabled= "1"
metaslab_lba_weighting_enabled= "1"
metaslab_preload_enabled= "1"
send_holes_without_birth_time= "1"
spa_asize_inflation = "24"
spa_config_path = "/etc/zfs/zpool.cache"
spa_load_print_vdev_tree= "0"
spa_load_verify_data= "1"
spa_load_verify_metadata= "1"
spa_load_verify_shift= "4"
spa_slop_shift = "5"
vdev_removal_max_span= "32768"
vdev_validate_skip = "0"
zap_iterate_prefetch= "1"
zfetch_array_rd_sz = "1048576"
zfetch_max_distance = "8388608"
zfetch_max_streams = "8"
zfetch_min_sec_reap = "2"
zfs_abd_scatter_enabled= "1"
zfs_abd_scatter_max_order= "10"
zfs_abd_scatter_min_size= "1536"
zfs_admin_snapshot = "0"
zfs_arc_average_blocksize= "8192"
zfs_arc_dnode_limit = "0"
zfs_arc_dnode_limit_percent= "10"
zfs_arc_dnode_reduce_percent= "10"
zfs_arc_grow_retry = "0"
zfs_arc_lotsfree_percent= "10"
zfs_arc_max = "1073741824"
zfs_arc_meta_adjust_restarts= "4096"
zfs_arc_meta_limit_percent= "75"
zfs_arc_meta_limit = "0"
zfs_arc_meta_min = "0"
zfs_arc_meta_prune = "10000"
zfs_arc_meta_strategy= "1"
zfs_arc_min_prefetch_ms= "0"
zfs_arc_min_prescient_prefetch_ms= "0"
zfs_arc_min = "0"
zfs_arc_p_dampener_disable= "1"
zfs_arc_p_min_shift = "0"
zfs_arc_pc_percent = "0"
zfs_arc_shrink_shift= "0"
zfs_arc_sys_free = "0"
zfs_async_block_max_blocks= "100000"
zfs_autoimport_disable= "1"
zfs_checksum_events_per_second= "20"
zfs_commit_timeout_pct= "5"
zfs_compressed_arc_enabled= "1"
zfs_condense_indirect_commit_entry_delay_ms= "0"
zfs_condense_indirect_vdevs_enable= "1"
zfs_condense_max_obsolete_bytes= "1073741824"
zfs_condense_min_mapping_bytes= "131072"
zfs_dbgmsg_enable = "1"
zfs_dbgmsg_maxsize = "4194304"
zfs_dbuf_state_index= "0"
zfs_ddt_data_is_special= "1"
zfs_deadman_checktime_ms= "60000"
zfs_deadman_enabled = "1"
zfs_deadman_failmode= "wait"
zfs_deadman_synctime_ms= "600000"
zfs_deadman_ziotime_ms= "300000"
zfs_dedup_prefetch = "0"
zfs_delay_min_dirty_percent= "60"
zfs_delay_scale = "500000"
zfs_delete_blocks = "20480"
zfs_dirty_data_max_max= "2020201472"
zfs_dirty_data_max_max_percent= "25"
zfs_dirty_data_max_percent= "10"
zfs_dirty_data_max = "808080588"
zfs_dirty_data_sync_percent= "20"
zfs_disable_ivset_guid_check= "0"
zfs_dmu_offset_next_sync= "0"
zfs_expire_snapshot = "300"
zfs_flags = "0"
zfs_free_bpobj_enabled= "1"
zfs_free_leak_on_eio= "0"
zfs_free_min_time_ms= "1000"
zfs_immediate_write_sz= "32768"
zfs_initialize_value= "16045690984833335022"
zfs_key_max_salt_uses= "400000000"
zfs_lua_max_instrlimit= "100000000"
zfs_lua_max_memlimit= "104857600"
zfs_max_missing_tvds= "0"
zfs_max_recordsize = "1048576"
zfs_metaslab_fragmentation_threshold= "70"
zfs_metaslab_segment_weight_enabled= "1"
zfs_metaslab_switch_threshold= "2"
zfs_mg_fragmentation_threshold= "95"
zfs_mg_noalloc_threshold= "0"
zfs_multihost_fail_intervals= "10"
zfs_multihost_history= "0"
zfs_multihost_import_intervals= "20"
zfs_multihost_interval= "1000"
zfs_multilist_num_sublists= "0"
zfs_no_scrub_io = "0"
zfs_no_scrub_prefetch= "0"
zfs_nocacheflush = "0"
zfs_nopwrite_enabled= "1"
zfs_object_mutex_size= "64"
zfs_obsolete_min_time_ms= "500"
zfs_override_estimate_recordsize= "0"
zfs_pd_bytes_max = "52428800"
zfs_per_txg_dirty_frees_percent= "5"
zfs_prefetch_disable= "0"
zfs_read_chunk_size = "1048576"
zfs_read_history = "0"
zfs_read_history_hits= "0"
zfs_reconstruct_indirect_combinations_max= "4096"
zfs_recover = "0"
zfs_recv_queue_length= "16777216"
zfs_removal_ignore_errors= "0"
zfs_removal_suspend_progress= "0"
zfs_remove_max_segment= "16777216"
zfs_resilver_disable_defer= "0"
zfs_resilver_min_time_ms= "3000"
zfs_scan_checkpoint_intval= "7200"
zfs_scan_fill_weight= "3"
zfs_scan_ignore_errors= "0"
zfs_scan_issue_strategy= "0"
zfs_scan_legacy = "0"
zfs_scan_max_ext_gap= "2097152"
zfs_scan_mem_lim_fact= "20"
zfs_scan_mem_lim_soft_fact= "20"
zfs_scan_strict_mem_lim= "0"
zfs_scan_suspend_progress= "0"
zfs_scan_vdev_limit = "4194304"
zfs_scrub_min_time_ms= "1000"
zfs_send_corrupt_data= "0"
zfs_send_queue_length= "16777216"
zfs_send_unmodified_spill_blocks= "1"
zfs_slow_io_events_per_second= "20"
zfs_spa_discard_memory_limit= "16777216"
zfs_special_class_metadata_reserve_pct= "25"
zfs_sync_pass_deferred_free= "2"
zfs_sync_pass_dont_compress= "8"
zfs_sync_pass_rewrite= "2"
zfs_sync_taskq_batch_pct= "75"
zfs_trim_extent_bytes_max= "134217728"
zfs_trim_extent_bytes_min= "32768"
zfs_trim_metaslab_skip= "0"
zfs_trim_queue_limit= "10"
zfs_trim_txg_batch = "32"
zfs_txg_history = "100"
zfs_txg_timeout = "5"
zfs_unlink_suspend_progress= "0"
zfs_user_indirect_is_special= "1"
zfs_vdev_aggregate_trim= "0"
zfs_vdev_aggregation_limit_non_rotating= "131072"
zfs_vdev_aggregation_limit= "1048576"
zfs_vdev_async_read_max_active= "3"
zfs_vdev_async_read_min_active= "1"
zfs_vdev_async_write_active_max_dirty_percent= "60"
zfs_vdev_async_write_active_min_dirty_percent= "30"
zfs_vdev_async_write_max_active= "10"
zfs_vdev_async_write_min_active= "2"
zfs_vdev_cache_bshift= "16"
zfs_vdev_cache_max = "16384"
zfs_vdev_cache_size = "0"
zfs_vdev_default_ms_count= "200"
zfs_vdev_initializing_max_active= "1"
zfs_vdev_initializing_min_active= "1"
zfs_vdev_max_active = "1000"
zfs_vdev_min_ms_count= "16"
zfs_vdev_mirror_non_rotating_inc= "0"
zfs_vdev_mirror_non_rotating_seek_inc= "1"
zfs_vdev_mirror_rotating_inc= "0"
zfs_vdev_mirror_rotating_seek_inc= "5"
zfs_vdev_mirror_rotating_seek_offset= "1048576"
zfs_vdev_ms_count_limit= "131072"
zfs_vdev_queue_depth_pct= "1000"
zfs_vdev_raidz_impl = "cycle [fastest] original scalar sse2 ssse3 "
zfs_vdev_read_gap_limit= "32768"
zfs_vdev_removal_max_active= "2"
zfs_vdev_removal_min_active= "1"
zfs_vdev_scheduler = "unused"
zfs_vdev_scrub_max_active= "2"
zfs_vdev_scrub_min_active= "1"
zfs_vdev_sync_read_max_active= "10"
zfs_vdev_sync_read_min_active= "10"
zfs_vdev_sync_write_max_active= "10"
zfs_vdev_sync_write_min_active= "10"
zfs_vdev_trim_max_active= "2"
zfs_vdev_trim_min_active= "1"
zfs_vdev_write_gap_limit= "4096"
zfs_zevent_cols = "80"
zfs_zevent_console = "0"
zfs_zevent_len_max = "512"
zfs_zil_clean_taskq_maxalloc= "1048576"
zfs_zil_clean_taskq_minalloc= "1024"
zfs_zil_clean_taskq_nthr_pct= "100"
zil_maxblocksize = "131072"
zil_nocacheflush = "0"
zil_replay_disable = "0"
zil_slog_bulk = "786432"
zio_deadman_log_all = "0"
zio_dva_throttle_enabled= "1"
zio_requeue_io_start_cut_in_line= "1"
zio_slow_io_ms = "30000"
zio_taskq_batch_pct = "75"
zvol_inhibit_dev = "0"
zvol_major = "230"
zvol_max_discard_blocks= "16384"
zvol_prefetch_bytes = "131072"
zvol_request_sync = "0"
zvol_threads = "32"
zvol_volmode = "1"

Sections:
.bss = "0xffffffffc07d2100"
.data = "0xffffffffc07c0060"
.exit.text = "0xffffffffc07738c9"
.gnu.linkonce.this_module= "0xffffffffc07d1dc0"
.init.text = "0xffffffffc09a7000"
.note.gnu.build-id = "0xffffffffc0774000"
.parainstructions = "0xffffffffc07aec00"
.rodata = "0xffffffffc0776800"
.rodata.str1.1 = "0xffffffffc078de1c"
.rodata.str1.8 = "0xffffffffc0792cb0"
.smp_locks = "0xffffffffc078d684"
.strtab = "0xffffffffc09e1178"
.symtab = "0xffffffffc09a9000"
.text = "0xffffffffc05f9000"
.text.unlikely = "0xffffffffc07731d8"
__bug_table = "0xffffffffc07d1da4"
__jump_table = "0xffffffffc07c0000"
__ksymtab = "0xffffffffc0774030"
__ksymtab_strings = "0xffffffffc07bd148"
__mcount_loc = "0xffffffffc07b4a28"
__param = "0xffffffffc07b2610"

root@zfs:~# cat /proc/spl/kstat/zfs/arcstats
12 1 0x01 98 26656 14093184634 1224910481851349
name type data
hits 4 704041195
misses 4 2744949
demand_data_hits 4 260501
demand_data_misses 4 476953
demand_metadata_hits 4 703209429
demand_metadata_misses 4 969970
prefetch_data_hits 4 11386
prefetch_data_misses 4 239621
prefetch_metadata_hits 4 559879
prefetch_metadata_misses 4 1058405
mru_hits 4 13617271
mru_ghost_hits 4 17737
mfu_hits 4 690001738
mfu_ghost_hits 4 1765
deleted 4 3101597
mutex_miss 4 1677
access_skip 4 5
evict_skip 4 39643211
evict_not_enough 4 7411892
evict_l2_cached 4 0
evict_l2_eligible 4 21536072704
evict_l2_ineligible 4 40198401024
evict_l2_skip 4 0
hash_elements 4 13708
hash_elements_max 4 432738
hash_collisions 4 364283
hash_chains 4 94
hash_chain_max 4 5
p 4 689080832
c 4 1073741824
c_min 4 252525184
c_max 4 1073741824
size 4 1222968480
compressed_size 4 295622656
uncompressed_size 4 332941312
overhead_size 4 273108480
hdr_size 4 6912992
data_size 4 33457664
metadata_size 4 535273472
dbuf_size 4 133838016
dnode_size 4 380898624
bonus_size 4 132587712
anon_size 4 50364416
anon_evictable_data 4 0
anon_evictable_metadata 4 0
mru_size 4 310185984
mru_evictable_data 4 0
mru_evictable_metadata 4 0
mru_ghost_size 4 0
mru_ghost_evictable_data 4 0
mru_ghost_evictable_metadata 4 0
mfu_size 4 208180736
mfu_evictable_data 4 0
mfu_evictable_metadata 4 0
mfu_ghost_size 4 0
mfu_ghost_evictable_data 4 0
mfu_ghost_evictable_metadata 4 0
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_lock_retry 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_evict_l1cached 4 0
l2_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_asize 4 0
l2_hdr_size 4 0
memory_throttle_count 4 0
memory_direct_count 4 0
memory_indirect_count 4 0
memory_all_bytes 4 8080805888
memory_free_bytes 4 3879395328
memory_available_bytes 3 3753136128
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 15495680
arc_prune 4 11107355
arc_meta_used 4 1189510816
arc_meta_limit 4 805306368
arc_dnode_limit 4 80530636
arc_meta_max 4 1189630648
arc_meta_min 4 16777216
async_upgrade_sync 4 1796
demand_hit_predictive_prefetch 4 77527
demand_hit_prescient_prefetch 4 1448691
arc_need_free 4 0
arc_sys_free 4 126262592
arc_raw_size 4 195579904

root@zfs:~# arc_summary


ZFS Subsystem Report Wed Feb 12 16:33:48 2020
Linux 4.15.0-76-generic 0.8.3-1~rlaager1.18.04.1
Machine: zfs (x86_64) 0.8.3-1~rlaager1.18.04.1

ARC status: HEALTHY
Memory throttle count: 0

ARC size (current): 114.5 % 1.1 GiB
Target size (adaptive): 100.0 % 1.0 GiB
Min size (hard limit): 23.5 % 240.8 MiB
Max size (high water): 4:1 1.0 GiB
Most Frequently Used (MFU) cache size: 39.7 % 199.4 MiB
Most Recently Used (MRU) cache size: 60.3 % 302.6 MiB
Metadata cache size (hard limit): 75.0 % 768.0 MiB
Metadata cache size (current): 148.5 % 1.1 GiB
Dnode cache size (hard limit): 10.0 % 76.8 MiB
Dnode cache size (current): 474.1 % 364.1 MiB

ARC hash breakdown:
Elements max: 432.7k
Elements current: 3.2 % 13.9k
Collisions: 364.3k
Chain max: 5
Chains: 98

ARC misc:
Deleted: 3.1M
Mutex misses: 1.7k
Eviction skips: 40.1M

ARC total accesses (hits + misses): 706.8M
Cache hit ratio: 99.6 % 704.0M
Cache miss ratio: 0.4 % 2.7M
Actual hit ratio (MFU + MRU hits): 99.6 % 703.6M
Data demand efficiency: 35.3 % 737.5k
Data prefetch efficiency: 4.5 % 251.0k

Cache hits by cache type:
Most frequently used (MFU): 98.0 % 690.0M
Most recently used (MRU): 1.9 % 13.6M
Most frequently used (MFU) ghost: < 0.1 % 1.8k
Most recently used (MRU) ghost: < 0.1 % 17.7k
Anonymously used: 0.1 % 402.7k

Cache hits by data type:
Demand data: < 0.1 % 260.5k
Demand prefetch data: < 0.1 % 11.4k
Demand metadata: 99.9 % 703.2M
Demand prefetch metadata: 0.1 % 559.9k

Cache misses by data type:
Demand data: 17.4 % 477.0k
Demand prefetch data: 8.7 % 239.6k
Demand metadata: 35.3 % 970.0k
Demand prefetch metadata: 38.6 % 1.1M

DMU prefetch efficiency: 439.6M
Hit ratio: 0.1 % 235.6k
Miss ratio: 99.9 % 439.3M

L2ARC not detected, skipping section

Solaris Porting Layer (SPL):
spl_hostid 0
spl_hostid_path /etc/hostid
spl_kmem_alloc_max 1048576
spl_kmem_alloc_warn 65536
spl_kmem_cache_expire 2
spl_kmem_cache_kmem_limit 2048
spl_kmem_cache_kmem_threads 4
spl_kmem_cache_magazine_size 0
spl_kmem_cache_max_size 32
spl_kmem_cache_obj_per_slab 8
spl_kmem_cache_obj_per_slab_min 1
spl_kmem_cache_reclaim 0
spl_kmem_cache_slab_limit 16384
spl_max_show_tasks 512
spl_panic_halt 0
spl_schedule_hrtimeout_slack_us 0
spl_taskq_kick 0
spl_taskq_thread_bind 0
spl_taskq_thread_dynamic 1
spl_taskq_thread_priority 1
spl_taskq_thread_sequential 4

Tunables:
dbuf_cache_hiwater_pct 10
dbuf_cache_lowater_pct 10
dbuf_cache_max_bytes 33554432
dbuf_cache_shift 5
dbuf_metadata_cache_max_bytes 16777216
dbuf_metadata_cache_shift 6
dmu_object_alloc_chunk_shift 7
dmu_prefetch_max 134217728
ignore_hole_birth 1
l2arc_feed_again 1
l2arc_feed_min_ms 200
l2arc_feed_secs 1
l2arc_headroom 2
l2arc_headroom_boost 200
l2arc_noprefetch 1
l2arc_norw 0
l2arc_write_boost 8388608
l2arc_write_max 8388608
metaslab_aliquot 524288
metaslab_bias_enabled 1
metaslab_debug_load 0
metaslab_debug_unload 0
metaslab_df_max_search 16777216
metaslab_df_use_largest_segment 0
metaslab_force_ganging 16777217
metaslab_fragmentation_factor_enabled 1
metaslab_lba_weighting_enabled 1
metaslab_preload_enabled 1
send_holes_without_birth_time 1
spa_asize_inflation 24
spa_config_path /etc/zfs/zpool.cache
spa_load_print_vdev_tree 0
spa_load_verify_data 1
spa_load_verify_metadata 1
spa_load_verify_shift 4
spa_slop_shift 5
vdev_removal_max_span 32768
vdev_validate_skip 0
zap_iterate_prefetch 1
zfetch_array_rd_sz 1048576
zfetch_max_distance 8388608
zfetch_max_streams 8
zfetch_min_sec_reap 2
zfs_abd_scatter_enabled 1
zfs_abd_scatter_max_order 10
zfs_abd_scatter_min_size 1536
zfs_admin_snapshot 0
zfs_arc_average_blocksize 8192
zfs_arc_dnode_limit 0
zfs_arc_dnode_limit_percent 10
zfs_arc_dnode_reduce_percent 10
zfs_arc_grow_retry 0
zfs_arc_lotsfree_percent 10
zfs_arc_max 1073741824
zfs_arc_meta_adjust_restarts 4096
zfs_arc_meta_limit 0
zfs_arc_meta_limit_percent 75
zfs_arc_meta_min 0
zfs_arc_meta_prune 10000
zfs_arc_meta_strategy 1
zfs_arc_min 0
zfs_arc_min_prefetch_ms 0
zfs_arc_min_prescient_prefetch_ms 0
zfs_arc_p_dampener_disable 1
zfs_arc_p_min_shift 0
zfs_arc_pc_percent 0
zfs_arc_shrink_shift 0
zfs_arc_sys_free 0
zfs_async_block_max_blocks 100000
zfs_autoimport_disable 1
zfs_checksum_events_per_second 20
zfs_commit_timeout_pct 5
zfs_compressed_arc_enabled 1
zfs_condense_indirect_commit_entry_delay_ms 0
zfs_condense_indirect_vdevs_enable 1
zfs_condense_max_obsolete_bytes 1073741824
zfs_condense_min_mapping_bytes 131072
zfs_dbgmsg_enable 1
zfs_dbgmsg_maxsize 4194304
zfs_dbuf_state_index 0
zfs_ddt_data_is_special 1
zfs_deadman_checktime_ms 60000
zfs_deadman_enabled 1
zfs_deadman_failmode wait
zfs_deadman_synctime_ms 600000
zfs_deadman_ziotime_ms 300000
zfs_dedup_prefetch 0
zfs_delay_min_dirty_percent 60
zfs_delay_scale 500000
zfs_delete_blocks 20480
zfs_dirty_data_max 808080588
zfs_dirty_data_max_max 2020201472
zfs_dirty_data_max_max_percent 25
zfs_dirty_data_max_percent 10
zfs_dirty_data_sync_percent 20
zfs_disable_ivset_guid_check 0
zfs_dmu_offset_next_sync 0
zfs_expire_snapshot 300
zfs_flags 0
zfs_free_bpobj_enabled 1
zfs_free_leak_on_eio 0
zfs_free_min_time_ms 1000
zfs_immediate_write_sz 32768
zfs_initialize_value 16045690984833335022
zfs_key_max_salt_uses 400000000
zfs_lua_max_instrlimit 100000000
zfs_lua_max_memlimit 104857600
zfs_max_missing_tvds 0
zfs_max_recordsize 1048576
zfs_metaslab_fragmentation_threshold 70
zfs_metaslab_segment_weight_enabled 1
zfs_metaslab_switch_threshold 2
zfs_mg_fragmentation_threshold 95
zfs_mg_noalloc_threshold 0
zfs_multihost_fail_intervals 10
zfs_multihost_history 0
zfs_multihost_import_intervals 20
zfs_multihost_interval 1000
zfs_multilist_num_sublists 0
zfs_no_scrub_io 0
zfs_no_scrub_prefetch 0
zfs_nocacheflush 0
zfs_nopwrite_enabled 1
zfs_object_mutex_size 64
zfs_obsolete_min_time_ms 500
zfs_override_estimate_recordsize 0
zfs_pd_bytes_max 52428800
zfs_per_txg_dirty_frees_percent 5
zfs_prefetch_disable 0
zfs_read_chunk_size 1048576
zfs_read_history 0
zfs_read_history_hits 0
zfs_reconstruct_indirect_combinations_max 4096
zfs_recover 0
zfs_recv_queue_length 16777216
zfs_removal_ignore_errors 0
zfs_removal_suspend_progress 0
zfs_remove_max_segment 16777216
zfs_resilver_disable_defer 0
zfs_resilver_min_time_ms 3000
zfs_scan_checkpoint_intval 7200
zfs_scan_fill_weight 3
zfs_scan_ignore_errors 0
zfs_scan_issue_strategy 0
zfs_scan_legacy 0
zfs_scan_max_ext_gap 2097152
zfs_scan_mem_lim_fact 20
zfs_scan_mem_lim_soft_fact 20
zfs_scan_strict_mem_lim 0
zfs_scan_suspend_progress 0
zfs_scan_vdev_limit 4194304
zfs_scrub_min_time_ms 1000
zfs_send_corrupt_data 0
zfs_send_queue_length 16777216
zfs_send_unmodified_spill_blocks 1
zfs_slow_io_events_per_second 20
zfs_spa_discard_memory_limit 16777216
zfs_special_class_metadata_reserve_pct 25
zfs_sync_pass_deferred_free 2
zfs_sync_pass_dont_compress 8
zfs_sync_pass_rewrite 2
zfs_sync_taskq_batch_pct 75
zfs_trim_extent_bytes_max 134217728
zfs_trim_extent_bytes_min 32768
zfs_trim_metaslab_skip 0
zfs_trim_queue_limit 10
zfs_trim_txg_batch 32
zfs_txg_history 100
zfs_txg_timeout 5
zfs_unlink_suspend_progress 0
zfs_user_indirect_is_special 1
zfs_vdev_aggregate_trim 0
zfs_vdev_aggregation_limit 1048576
zfs_vdev_aggregation_limit_non_rotating 131072
zfs_vdev_async_read_max_active 3
zfs_vdev_async_read_min_active 1
zfs_vdev_async_write_active_max_dirty_percent 60
zfs_vdev_async_write_active_min_dirty_percent 30
zfs_vdev_async_write_max_active 10
zfs_vdev_async_write_min_active 2
zfs_vdev_cache_bshift 16
zfs_vdev_cache_max 16384
zfs_vdev_cache_size 0
zfs_vdev_default_ms_count 200
zfs_vdev_initializing_max_active 1
zfs_vdev_initializing_min_active 1
zfs_vdev_max_active 1000
zfs_vdev_min_ms_count 16
zfs_vdev_mirror_non_rotating_inc 0
zfs_vdev_mirror_non_rotating_seek_inc 1
zfs_vdev_mirror_rotating_inc 0
zfs_vdev_mirror_rotating_seek_inc 5
zfs_vdev_mirror_rotating_seek_offset 1048576
zfs_vdev_ms_count_limit 131072
zfs_vdev_queue_depth_pct 1000
zfs_vdev_raidz_impl cycle [fastest] original scalar sse2 ssse3
zfs_vdev_read_gap_limit 32768
zfs_vdev_removal_max_active 2
zfs_vdev_removal_min_active 1
zfs_vdev_scheduler unused
zfs_vdev_scrub_max_active 2
zfs_vdev_scrub_min_active 1
zfs_vdev_sync_read_max_active 10
zfs_vdev_sync_read_min_active 10
zfs_vdev_sync_write_max_active 10
zfs_vdev_sync_write_min_active 10
zfs_vdev_trim_max_active 2
zfs_vdev_trim_min_active 1
zfs_vdev_write_gap_limit 4096
zfs_zevent_cols 80
zfs_zevent_console 0
zfs_zevent_len_max 512
zfs_zil_clean_taskq_maxalloc 1048576
zfs_zil_clean_taskq_minalloc 1024
zfs_zil_clean_taskq_nthr_pct 100
zil_maxblocksize 131072
zil_nocacheflush 0
zil_replay_disable 0
zil_slog_bulk 786432
zio_deadman_log_all 0
zio_dva_throttle_enabled 1
zio_requeue_io_start_cut_in_line 1
zio_slow_io_ms 30000
zio_taskq_batch_pct 75
zvol_inhibit_dev 0
zvol_major 230
zvol_max_discard_blocks 16384
zvol_prefetch_bytes 131072
zvol_request_sync 0
zvol_threads 32
zvol_volmode 1

VDEV cache disabled, skipping section

ZIL committed transactions: 16
Commit requests: 62
Flushes to stable storage: 62
Transactions to SLOG storage pool: 0 Bytes 0
Transactions to non-SLOG storage pool: 3.7 KiB 4

grep . /sys/module/zfs/parameters/zfs_arc_meta*
/sys/module/zfs/parameters/zfs_arc_meta_adjust_restarts:4096
/sys/module/zfs/parameters/zfs_arc_meta_limit:0
/sys/module/zfs/parameters/zfs_arc_meta_limit_percent:75
/sys/module/zfs/parameters/zfs_arc_meta_min:0
/sys/module/zfs/parameters/zfs_arc_meta_prune:10000
/sys/module/zfs/parameters/zfs_arc_meta_strategy:1
````

the workaround had only temporary effects - arc_prune is spinning again

update: I had to increase zfs_arc_max to end spinning

eventually it's spinning again despite increasing zfs_arc_max

Hasn't happened here -- knock on wood -- but I'm done copying lots of files. Mine's got 64G RAM and 160G cache SSD, ARC size is left at the default. It's a 2x8 opteron and it didn't lock up until it had arc_prune spinning on every core. (I don't have a single one in top now.)

I have seen this arc_prune behavior last year on one server with 32GB of ECC. I had to disable the weekly scrubs because of it. I think there is another thread where I commented about that. Since I have just moved to a new building and the network topology is completely different I can't test now.

I'm seeing the issue again after reboot, on a simple zfs umount on the newly created pool

I thought I'd add a note that I have also seen this on a debian buster with the debian backports version on amd64. It did clear it self out after some time (hours if I remember right). If I see it again I'll try to get more information.

A couple of data points worth noting: no cache device, it only receives snapshots from a different system, little if any user io and no networks shares from it.

I'm seeing a similar problem (arc_reclaim and arc_prune chewing up cpu). Bumping zfs_arc_limit_percent only provided temporary reprieve while arc_meta filled up. Bumping all the way to 100% doesn't fix the issue.

Why is there so much metadata in the ARC in the first place? If relevant, the machine in question is a dedicated backup machine, currently doing nothing other than be the receiving end of a lot of zfs send -i "backups".

System info:

russ@pth-hs1:~$ cat /sys/module/zfs/version
0.7.5-1ubuntu16.6
russ@pth-hs1:~$ uname -a
Linux pth-hs1 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
russ@pth-hs1:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic

Here is the current state of the ARC (allowing 100% arc_meta, but still having prune issues):

russ@pth-hs1:~$ grep dnode /proc/spl/kstat/zfs/arcstats
dnode_size                      4    8716864
arc_dnode_limit                 4    1684769382
russ@pth-hs1:~$ egrep "(arc_meta_used|arc_meta_limit)" /proc/spl/kstat/zfs/arcstats                        
arc_meta_used                   4    16847797336
arc_meta_limit                  4    16847693824
russ@pth-hs1:~$ grep -H . /sys/module/zfs/parameters/zfs_arc_{meta_limit{,_percent},dnode_limit{,_percent}}
/sys/module/zfs/parameters/zfs_arc_meta_limit:0
/sys/module/zfs/parameters/zfs_arc_meta_limit_percent:100
/sys/module/zfs/parameters/zfs_arc_dnode_limit:0
/sys/module/zfs/parameters/zfs_arc_dnode_limit_percent:10
russ@pth-hs1:~$ cat /proc/spl/kstat/zfs/arcstats
13 1 0x01 96 4608 41705217416 3820892427321077
name                            type data
hits                            4    7413219043
misses                          4    5467855407
demand_data_hits                4    0
demand_data_misses              4    0
demand_metadata_hits            4    7362177240
demand_metadata_misses          4    5453086223
prefetch_data_hits              4    0
prefetch_data_misses            4    0
prefetch_metadata_hits          4    51041803
prefetch_metadata_misses        4    14769184
mru_hits                        4    165815182
mru_ghost_hits                  4    1446761
mfu_hits                        4    7216088077
mfu_ghost_hits                  4    4321466
deleted                         4    112910070
mutex_miss                      4    40132
access_skip                     4    3387
evict_skip                      4    15984980671
evict_not_enough                4    22139723
evict_l2_cached                 4    0
evict_l2_eligible               4    948597889024
evict_l2_ineligible             4    1871813890048
evict_l2_skip                   4    0
hash_elements                   4    428907
hash_elements_max               4    2104109
hash_collisions                 4    43345413
hash_chains                     4    20493
hash_chain_max                  4    7
p                               4    16200329216
c                               4    16847693824
c_min                           4    1052980864
c_max                           4    16847693824
size                            4    16847639536
compressed_size                 4    16577940480
uncompressed_size               4    53259091456
overhead_size                   4    114052608
hdr_size                        4    141165920
data_size                       4    0
metadata_size                   4    16691993088
dbuf_size                       4    3742544
dnode_size                      4    8716864
bonus_size                      4    2021120
anon_size                       4    296960
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    31092736
mru_evictable_data              4    0
mru_evictable_metadata          4    27729920
mru_ghost_size                  4    66322432
mru_ghost_evictable_data        4    0
mru_ghost_evictable_metadata    4    66322432
mfu_size                        4    16660603392
mfu_evictable_data              4    0
mfu_evictable_metadata          4    16499196928
mfu_ghost_size                  4    21757952
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    21757952
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
memory_throttle_count           4    0
memory_direct_count             4    0
memory_indirect_count           4    0
memory_all_bytes                4    33695387648
memory_free_bytes               4    13351628800
memory_available_bytes          3    12825141248
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    67235209
arc_meta_used                   4    16847639536
arc_meta_limit                  4    16847693824
arc_dnode_limit                 4    1684769382
arc_meta_max                    4    16913555592
arc_meta_min                    4    16777216
sync_wait_for_async             4    57
demand_hit_predictive_prefetch  4    2989
arc_need_free                   4    0
arc_sys_free                    4    526490432

arc_prune settled down after a few hours and CPU usage is now back to normal levels. Given the timing I'm unclear if bumping zfs_arc_limit_percent to 100% was the difference maker, or if I just needed to wait for a few more hours?

Below is a crude view of the incident from a CPU load perspective across all cpus, where the middle marker is when I was bumping zfs_arc_limit_percent up, ultimately leaving it at 100%:

image

I don't know enough about ARC performance to know why metadata would be such a significant fraction, nor whether it is a long-term problem to move it from the default zfs_arc_limit_percent value of 75% to 100%.

For what it's worth, whenever I run a workload on my system that I know will peg the arc_prune threads, I run this in another terminal:

while true ; do zpool iostat 10 12 ; echo 2 > /proc/sys/vm/drop_caches ; done

It's blunt and brutal but stops those arc_prune threads from spending 50% of the cpus trying to acquire spinlocks. It might save others from reboots, anyway.

Thanks

not for me, i already tried that then eventually it still comes to system hang

We experience recently the very same issue.
arc_prune and arc_reclaim consumes a lots of CPU and all cores.

We can see this issue only a single machine. The biggest difference is that this is an AMD Epyc with a lots of cores:
AMD EPYC 7452 32-Core Processor.
Also it's worth to mention, this is an SSD only pool with no log and cache device:

  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 214h25m with 0 errors on Mon Aug 10 20:25:45 2020
config:

    NAME                                            STATE     READ WRITE CKSUM
    tank                                            ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        ata-INTEL_SSDSC2KB038T8_PHYF947405A33P8EGN  ONLINE       0     0     0
        ata-INTEL_SSDSC2KB038T8_PHYF947403PT3P8EGN  ONLINE       0     0     0

errors: No known data errors

And what is also very interesting, that according to the monitoring system it started to happen only a couple of weeks ago, the machines was working properly for weeks before.

I also tried to increase arc_meta_limit with no success:

# cat /proc/spl/kstat/zfs/arcstats|grep arc_meta
arc_meta_used                   4    21485281368
arc_meta_limit                  4    16106127360
arc_meta_max                    4    21558749936
arc_meta_min                    4    16777216
# echo 21558749936 > /sys/module/zfs/parameters/zfs_arc_meta_limit
# cat /proc/spl/kstat/zfs/arcstats|grep arc_meta
arc_meta_used                   4    21530789416
arc_meta_limit                  4    16106127360
arc_meta_max                    4    21558749936
arc_meta_min                    4    16777216

BTW echo 3 > /proc/sys/vm/drop_caches also heals the machine for a short time.

Why is that?

In addition it's unclear to me.
Will zfs_arc_meta_limit_percent=100 fix it or is it also just a temporary solution?

@behlendorf sorry to ping you, but I read all the referenced issues without fully understand some important things:

  • why does arc_meta grow so much?
  • why can not arc_prune really reclaim sufficient memory (ie: it does no forward progress)?
  • what pinned buffers are and why can not be reclaimed by arc_reclaim?
  • what kind of workload can trigger the described issue?

I have yet to see a 100% arc_prune and/or arc_reclaim thread on my systems, but I would like to be prepared for the worse...

Thanks.

On Sun, Aug 30, 2020 at 05:01:23AM -0700, shodanshok wrote:

  • what kind of workload can trigger the described issue?

In my case, I've got roughly one terabyte of unpacked source code on a
system with only 128 gigabytes of memory and I want to use ripgrep to
search through all the files.

I realize this is a silly workload.

Because it runs faster with a loop like this running at the same time I
have some hope that the situation can improve with the clever application
of programming:

while true ; do sleep 300 ; echo 2 > /proc/sys/vm/drop_caches ; done

Thanks

Possibly related:

10331 - fix dnode eviction typo in arc_evict_state()

10508 - Data throughput causing apparent (directory) metadata eviction with metadata_size << arc_meta_min

10563 - dbuf cache size is 1/32nd what was intended

10600 - Revise ARC shrinker algorithm

10610 - Limit dbuf cache sizes based only on ARC target size by default

10618 - Restore ARC MFU/MRU pressure

I have a PowerEdge R730 experiencing this problem since migrating to Ubuntu 20.04. The problem didn't occur with the very same pool under CentOS 8. It's a Veeam Backup Repository and the problem occurs daily causing connections to timeout and the respective backups to fail. My current solution is daily reboots.

This data was gathered after today's reboot. I'll gather again tomorrow while the problem exists.
$ cat /sys/module/zfs/version
0.8.3-1ubuntu12.4
$ grep -H . /sys/module/zfs/parameters/zfs_arc_{meta_limit{,_percent},dnode_limit{,_percent}}
/sys/module/zfs/parameters/zfs_arc_meta_limit:0
/sys/module/zfs/parameters/zfs_arc_meta_limit_percent:75
/sys/module/zfs/parameters/zfs_arc_dnode_limit:0
/sys/module/zfs/parameters/zfs_arc_dnode_limit_percent:10
$ grep dnode /proc/spl/kstat/zfs/arcstats
dnode_size 4 14965008
arc_dnode_limit 4 2529106329

zpool status.txt
arcstats.txt
arc_summary.txt
lshw.zip

I still suffer from this issue or actually it's even worse by time.

zfs: 0.7.5-1ubuntu16.9
# grep -H . /sys/module/zfs/parameters/zfs_arc_{meta_limit{,_percent},dnode_limit{,_percent}}
/sys/module/zfs/parameters/zfs_arc_meta_limit:0
/sys/module/zfs/parameters/zfs_arc_meta_limit_percent:100
/sys/module/zfs/parameters/zfs_arc_dnode_limit:0
/sys/module/zfs/parameters/zfs_arc_dnode_limit_percent:10
# cat /proc/spl/kstat/zfs/arcstats
13 1 0x01 96 4608 31700079457 2907324135965112
name                            type data
hits                            4    199453570301
misses                          4    9810320885
demand_data_hits                4    7803253271
demand_data_misses              4    871638623
demand_metadata_hits            4    188167540383
demand_metadata_misses          4    7420326730
prefetch_data_hits              4    4463401
prefetch_data_misses            4    177408833
prefetch_metadata_hits          4    3478313246
prefetch_metadata_misses        4    1340946699
mru_hits                        4    67486382216
mru_ghost_hits                  4    377163667
mfu_hits                        4    128560552058
mfu_ghost_hits                  4    171807604
deleted                         4    4191299437
mutex_miss                      4    207959998
access_skip                     4    3114
evict_skip                      4    1197985456718
evict_not_enough                4    13186935886
evict_l2_cached                 4    0
evict_l2_eligible               4    91339086637056
evict_l2_ineligible             4    31315599089664
evict_l2_skip                   4    0
hash_elements                   4    505208
hash_elements_max               4    3178536
hash_collisions                 4    32674152
hash_chains                     4    1708
hash_chain_max                  4    4
p                               4    11059855423
c                               4    31474836480
c_min                           4    16106127360
c_max                           4    31474836480
size                            4    31594430688
compressed_size                 4    2208061952
uncompressed_size               4    8709196288
overhead_size                   4    8706044928
hdr_size                        4    206378624
data_size                       4    45338624
metadata_size                   4    10868768256
dbuf_size                       4    4339239760
dnode_size                      4    11728919504
bonus_size                      4    4405785920
anon_size                       4    124821504
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    6348248576
mru_evictable_data              4    0
mru_evictable_metadata          4    12288
mru_ghost_size                  4    0
mru_ghost_evictable_data        4    0
mru_ghost_evictable_metadata    4    0
mfu_size                        4    4441036800
mfu_evictable_data              4    0
mfu_evictable_metadata          4    4096
mfu_ghost_size                  4    0
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    0
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
memory_throttle_count           4    0
memory_direct_count             4    2569
memory_indirect_count           4    0
memory_all_bytes                4    405654396928
memory_free_bytes               4    189280755712
memory_available_bytes          3    182942408704
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    66878734346
arc_meta_used                   4    31549092064
arc_meta_limit                  4    31474836480
arc_dnode_limit                 4    3147483648
arc_meta_max                    4    31596750464
arc_meta_min                    4    16777216
sync_wait_for_async             4    1974462
demand_hit_predictive_prefetch  4    171247226
arc_need_free                   4    0
arc_sys_free                    4    6338349952



md5-f6d24097f2a19ecf85884aba4447748c



# cat /proc/meminfo 
MemTotal:       396146872 kB
MemFree:        166195880 kB
MemAvailable:   185639888 kB
Buffers:         8803876 kB
Cached:          6643616 kB
SwapCached:            0 kB
Active:         137620084 kB
Inactive:       11846208 kB
Active(anon):   134078720 kB
Inactive(anon):    68992 kB
Active(file):    3541364 kB
Inactive(file): 11777216 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      47152124 kB
SwapFree:       47152124 kB
Dirty:              2992 kB
Writeback:          2244 kB
AnonPages:      134018136 kB
Mapped:          3125796 kB
Shmem:            224616 kB
Slab:           68624284 kB
SReclaimable:    6680184 kB
SUnreclaim:     61944100 kB
KernelStack:      187264 kB
PageTables:       969752 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    245225560 kB
Committed_AS:   187841364 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:  33300480 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:    38309648 kB
DirectMap2M:    263536640 kB
DirectMap1G:    101711872 kB



md5-5304189d21f51749d8463542618e87c0



top - 09:52:41 up 33 days, 16:04,  1 user,  load average: 26.44, 13.48, 13.92
Tasks: 4808 total,  26 running, 4173 sleeping,   0 stopped,   4 zombie
%Cpu(s):  2.7 us, 32.2 sy,  0.0 ni, 64.3 id,  0.8 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 39614688+total, 16529334+free, 20869318+used, 22160336 buff/cache
KiB Swap: 47152124 total, 47152124 free,        0 used. 18475769+avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                                                                                  
137636 110108    20   0 37.741g 0.010t  20384 S 197.2  2.8   2755:07 java                                                                                                                                                                                                                                                     
  1923 root      20   0       0      0      0 R  99.7  0.0  13381:06 arc_reclaim                                                                                                                                                                                                                                              
 16194 libvirt+  20   0 9150740 4.057g  20232 S  78.5  1.1   7399:15 qemu-system-x86                                                                                                                                                                                                                                          
 13688 libvirt+  20   0 12.739g 7.967g  20276 S  57.6  2.1  11956:36 qemu-system-x86                                                                                                                                                                                                                                          
 13606 libvirt+  20   0 13.577g 4.240g  20768 S  32.9  1.1  14539:43 qemu-system-x86                                                                                                                                                                                                                                          
212665 110110    20   0 37.105g 5.524g  19364 S  23.1  1.5 432:10.85 java                                                                                                                                                                                                                                                     
 56757 root      20   0       0      0      0 S  16.8  0.0   0:06.75 arc_prune                                                                                                                                                                                                                                                
 10076 root      20   0       0      0      0 S  16.1  0.0   0:08.16 arc_prune                                                                                                                                                                                                                                                
 20427 root      20   0       0      0      0 S  16.1  0.0   0:07.93 arc_prune                                                                                                                                                                                                                                                
  8251 root      20   0       0      0      0 S  15.8  0.0   0:13.09 arc_prune                                                                                                                                                                                                                                                
 10093 root      20   0       0      0      0 S  15.8  0.0   0:08.14 arc_prune                                                                                                                                                                                                                                                
 32372 root      20   0       0      0      0 S  15.8  0.0   0:07.83 arc_prune                                                                                                                                                                                                                                                
 95841 root      20   0       0      0      0 S  15.8  0.0   0:03.82 arc_prune                                                                                                                                                                                                                                                
173177 root      20   0       0      0      0 S  15.8  0.0   0:11.89 arc_prune                                                                                                                                                                                                                                                
191173 root      20   0       0      0      0 S  15.8  0.0   0:12.21 arc_prune                                                                                                                                                                                                                                                
247601 root      20   0       0      0      0 S  15.8  0.0   0:12.65 arc_prune                                                                                                                                                                                                                                                
247604 root      20   0       0      0      0 R  15.8  0.0   0:12.65 arc_prune                                                                                                                                                                                                                                                
252944 root      20   0       0      0      0 R  15.8  0.0   0:08.35 arc_prune                                                                                                                                                                                                                                                
  1210 root      20   0       0      0      0 S  15.5  0.0   0:08.29 arc_prune                                                                                                                                                                                                                                                
  1315 root      20   0       0      0      0 S  15.5  0.0   0:08.26 arc_prune                                                                                                                                                                                                                                                
  1333 root      20   0       0      0      0 S  15.5  0.0   0:08.16 arc_prune                                                                                                                                                                                                                                                
  1340 root      20   0       0      0      0 S  15.5  0.0   0:08.29 arc_prune                                                                                                                                                                                                                                                
  8311 root      20   0       0      0      0 S  15.5  0.0   0:14.03 arc_prune                                                                                                                                                                                                                                                
 32368 root      20   0       0      0      0 S  15.5  0.0   0:08.10 arc_prune                                                                                                                                                                                                                                                
 60711 root      20   0       0      0      0 S  15.5  0.0   0:06.13 arc_prune                                                                                                                                                                                                                                                
 89389 root      20   0       0      0      0 S  15.5  0.0   0:03.95 arc_prune                                                                                                                                                                                                                                                
 95103 root      20   0       0      0      0 S  15.5  0.0   0:03.89 arc_prune                                                                                                                                                                                                                                                
 95184 root      20   0       0      0      0 S  15.5  0.0   0:03.94 arc_prune                                                                                                                                                                                                                                                
 95793 root      20   0       0      0      0 S  15.5  0.0   0:03.95 arc_prune                                                                                                                                                                                                                                                
 95804 root      20   0       0      0      0 S  15.5  0.0   0:03.86 arc_prune                                                                                                                                                                                                                                                
 95847 root      20   0       0      0      0 S  15.5  0.0   0:03.87 arc_prune                                                                                                                                                                                                                                                
106985 root      20   0       0      0      0 S  15.5  0.0   0:03.12 arc_prune                                                                                                                                                                                                                                                
130233 root      20   0       0      0      0 R  15.5  0.0   0:00.64 arc_prune                                                                                                                                                                                                                                                
152777 root      20   0       0      0      0 S  15.5  0.0   0:13.30 arc_prune                                                                                                                                                                                                                                                
214525 root      20   0       0      0      0 S  15.5  0.0   0:11.15 arc_prune                                                                                                                                                                                                                                                
216125 root      20   0       0      0      0 S  15.5  0.0   0:11.04 arc_prune                                                                                                                                                                                                                                                
258884 root      20   0       0      0      0 S  15.5  0.0   0:08.31 arc_prune                                                                                                                                                                                                                                                
  1323 root      20   0       0      0      0 S  15.2  0.0   0:08.17 arc_prune                                                                                                                                                                                                                                                
  9765 root      20   0       0      0      0 R  15.2  0.0   0:08.52 arc_prune                                                                                                                                                                                                                                                
 10080 root      20   0       0      0      0 S  15.2  0.0   0:08.12 arc_prune                                                                                                                                                                                                                                                
 10749 root      20   0       0      0      0 S  15.2  0.0   0:08.03 arc_prune                                                                                                                                                                                                                                                
 13331 root      20   0       0      0      0 S  15.2  0.0   0:07.91 arc_prune                                                                                                                                                                                                                                                
 37764 root      20   0       0      0      0 S  15.2  0.0   0:07.63 arc_prune                                                                                                                                                                                                                                                
 40158 root      20   0       0      0      0 S  15.2  0.0   0:07.34 arc_prune                                                                                                                                                                                                                                                
 42200 root      20   0       0      0      0 S  15.2  0.0   0:07.28 arc_prune                                                                                                                                                                                                                                                
 42206 root      20   0       0      0      0 S  15.2  0.0   0:07.42 arc_prune                                                                                                                                                                                                                                                
 94550 root      20   0       0      0      0 S  15.2  0.0   0:03.89 arc_prune                                                                                                                                                                                                                                                
 95180 root      20   0       0      0      0 S  15.2  0.0   0:03.73 arc_prune                                                                                                                                                                                                                                                
 95543 root      20   0       0      0      0 S  15.2  0.0   0:03.77 arc_prune                                                                                                                                                                                                                                                
 95892 root      20   0       0      0      0 R  15.2  0.0   0:03.87 arc_prune                                                                                                                                                                                                                                                
103126 root      20   0       0      0      0 S  15.2  0.0   0:03.63 arc_prune                                                                                                                                                                                                                                                
104136 root      20   0       0      0      0 S  15.2  0.0   0:03.43 arc_prune                                                                                                                                                                                                                                                
123748 root      20   0       0      0      0 S  15.2  0.0   0:12.80 arc_prune                                                                                                                                                                                                                                                
146167 root      20   0       0      0      0 S  15.2  0.0   0:12.45 arc_prune                                                                                                                                                                                                                                                
146181 root      20   0       0      0      0 S  15.2  0.0   0:12.41 arc_prune                                                                                                                                                                                                                                                
146202 root      20   0       0      0      0 S  15.2  0.0   0:12.52 arc_prune                                                                                                                                                                                                                                                
146855 root      20   0       0      0      0 R  15.2  0.0   0:12.41 arc_prune                                                                                                                                                                                                                                                
183429 root      20   0       0      0      0 S  15.2  0.0   0:12.12 arc_prune                                                                                                                                                                                                                                                
183529 root      20   0       0      0      0 S  15.2  0.0   0:12.10 arc_prune                                                                                                                                                                                                                                                
187041 root      20   0       0      0      0 S  15.2  0.0   0:13.51 arc_prune                                                                                                                                                                                                                                                
191200 root      20   0       0      0      0 S  15.2  0.0   0:12.12 arc_prune                                                                                                                                                                                                                                                
210251 root      20   0       0      0      0 S  15.2  0.0   0:11.45 arc_prune                                                                                                                                                                                                                                                
213730 root      20   0       0      0      0 S  15.2  0.0   0:11.15 arc_prune                                                                                                                                                                                                                                                
213981 root      20   0       0      0      0 S  15.2  0.0   0:11.11 arc_prune                                                                                                                                                                                                                                                
214495 root      20   0       0      0      0 S  15.2  0.0   0:10.92 arc_prune                                                                                                                                                                                                                                                
217222 root      20   0       0      0      0 S  15.2  0.0   0:12.80 arc_prune                                                                                                                                                                                                                                                
228681 root      20   0       0      0      0 S  15.2  0.0   0:12.58 arc_prune                                                                                                                                                                                                                                                
229544 root      20   0       0      0      0 S  15.2  0.0   0:09.89 arc_prune                                                                                                                                                                                                                                                
243760 root      20   0       0      0      0 S  15.2  0.0   0:08.86 arc_prune                                                                                                                                                                                                                                                
247608 root      20   0       0      0      0 S  15.2  0.0   0:12.71 arc_prune                                                                                                                                                                                                                                                
256265 root      20   0       0      0      0 S  15.2  0.0   0:08.37 arc_prune                                                                                                                                                                                                                                                
  1204 root      20   0       0      0      0 S  14.9  0.0   0:08.08 arc_prune                                                                                                                                                                                                                                                
  1211 root      20   0       0      0      0 S  14.9  0.0   0:08.09 arc_prune                                                                                                                                                                                                                                                
  1273 root      20   0       0      0      0 R  14.9  0.0   0:08.56 arc_prune                                                                                                                                                                                                                                                
  1283 root      20   0       0      0      0 S  14.9  0.0   0:08.03 arc_prune                                                                                                                                                                                                                                                
  1305 root      20   0       0      0      0 S  14.9  0.0   0:08.24 arc_prune                                                                                                                                                                                                                                                
  1310 root      20   0       0      0      0 S  14.9  0.0   0:08.40 arc_prune                                                                                                                                                                                                                                                
  1314 root      20   0       0      0      0 S  14.9  0.0   0:08.19 arc_prune                                                   



md5-371f5068e5f47f3d75e78b6f902ab312



# systool -vm zfs
Module = "zfs"

  Attributes:
    coresize            = "3411968"
    initsize            = "0"
    initstate           = "live"
    refcnt              = "53"
    srcversion          = "8FA13059DE17672A1E4861D"
    taint               = "PO"
    uevent              = <store method only>
    version             = "0.7.5-1ubuntu16.9"

  Parameters:
    dbuf_cache_hiwater_pct= "10"
    dbuf_cache_lowater_pct= "10"
    dbuf_cache_max_bytes= "104857600"
    dbuf_cache_max_shift= "5"
    dmu_object_alloc_chunk_shift= "7"
    ignore_hole_birth   = "1"
    l2arc_feed_again    = "1"
    l2arc_feed_min_ms   = "200"
    l2arc_feed_secs     = "1"
    l2arc_headroom      = "2"
    l2arc_headroom_boost= "200"
    l2arc_noprefetch    = "1"
    l2arc_norw          = "0"
    l2arc_write_boost   = "8388608"
    l2arc_write_max     = "8388608"
    metaslab_aliquot    = "524288"
    metaslab_bias_enabled= "1"
    metaslab_debug_load = "0"
    metaslab_debug_unload= "0"
    metaslab_fragmentation_factor_enabled= "1"
    metaslab_lba_weighting_enabled= "1"
    metaslab_preload_enabled= "1"
    metaslabs_per_vdev  = "200"
    send_holes_without_birth_time= "1"
    spa_asize_inflation = "24"
    spa_config_path     = "/etc/zfs/zpool.cache"
    spa_load_verify_data= "1"
    spa_load_verify_maxinflight= "10000"
    spa_load_verify_metadata= "1"
    spa_slop_shift      = "5"
    zfetch_array_rd_sz  = "1048576"
    zfetch_max_distance = "8388608"
    zfetch_max_streams  = "8"
    zfetch_min_sec_reap = "2"
    zfs_abd_scatter_enabled= "1"
    zfs_abd_scatter_max_order= "10"
    zfs_admin_snapshot  = "1"
    zfs_arc_average_blocksize= "8192"
    zfs_arc_dnode_limit = "0"
    zfs_arc_dnode_limit_percent= "10"
    zfs_arc_dnode_reduce_percent= "10"
    zfs_arc_grow_retry  = "0"
    zfs_arc_lotsfree_percent= "10"
    zfs_arc_max         = "31474836480"
    zfs_arc_meta_adjust_restarts= "4096"
    zfs_arc_meta_limit_percent= "100"
    zfs_arc_meta_limit  = "0"
    zfs_arc_meta_min    = "0"
    zfs_arc_meta_prune  = "10000"
    zfs_arc_meta_strategy= "1"
    zfs_arc_min         = "16106127360"
    zfs_arc_min_prefetch_lifespan= "0"
    zfs_arc_p_aggressive_disable= "1"
    zfs_arc_p_dampener_disable= "1"
    zfs_arc_p_min_shift = "0"
    zfs_arc_pc_percent  = "0"
    zfs_arc_shrink_shift= "0"
    zfs_arc_sys_free    = "0"
    zfs_autoimport_disable= "1"
    zfs_compressed_arc_enabled= "1"
    zfs_dbgmsg_enable   = "0"
    zfs_dbgmsg_maxsize  = "4194304"
    zfs_dbuf_state_index= "0"
    zfs_deadman_checktime_ms= "5000"
    zfs_deadman_enabled = "1"
    zfs_deadman_synctime_ms= "1000000"
    zfs_dedup_prefetch  = "0"
    zfs_delay_min_dirty_percent= "60"
    zfs_delay_scale     = "500000"
    zfs_delete_blocks   = "20480"
    zfs_dirty_data_max_max= "4294967296"
    zfs_dirty_data_max_max_percent= "25"
    zfs_dirty_data_max_percent= "10"
    zfs_dirty_data_max  = "4294967296"
    zfs_dirty_data_sync = "67108864"
    zfs_dmu_offset_next_sync= "0"
    zfs_expire_snapshot = "300"
    zfs_flags           = "0"
    zfs_free_bpobj_enabled= "1"
    zfs_free_leak_on_eio= "0"
    zfs_free_max_blocks = "100000"
    zfs_free_min_time_ms= "1000"
    zfs_immediate_write_sz= "32768"
    zfs_max_recordsize  = "1048576"
    zfs_mdcomp_disable  = "0"
    zfs_metaslab_fragmentation_threshold= "70"
    zfs_metaslab_segment_weight_enabled= "1"
    zfs_metaslab_switch_threshold= "2"
    zfs_mg_fragmentation_threshold= "85"
    zfs_mg_noalloc_threshold= "0"
    zfs_multihost_fail_intervals= "5"
    zfs_multihost_history= "0"
    zfs_multihost_import_intervals= "10"
    zfs_multihost_interval= "1000"
    zfs_multilist_num_sublists= "0"
    zfs_no_scrub_io     = "0"
    zfs_no_scrub_prefetch= "0"
    zfs_nocacheflush    = "0"
    zfs_nopwrite_enabled= "1"
    zfs_object_mutex_size= "64"
    zfs_pd_bytes_max    = "52428800"
    zfs_per_txg_dirty_frees_percent= "30"
    zfs_prefetch_disable= "0"
    zfs_read_chunk_size = "1048576"
    zfs_read_history    = "0"
    zfs_read_history_hits= "0"
    zfs_recover         = "0"
    zfs_resilver_delay  = "2"
    zfs_resilver_min_time_ms= "3000"
    zfs_scan_idle       = "50"
    zfs_scan_min_time_ms= "1000"
    zfs_scrub_delay     = "4"
    zfs_send_corrupt_data= "0"
    zfs_sync_pass_deferred_free= "2"
    zfs_sync_pass_dont_compress= "5"
    zfs_sync_pass_rewrite= "2"
    zfs_sync_taskq_batch_pct= "75"
    zfs_top_maxinflight = "32"
    zfs_txg_history     = "0"
    zfs_txg_timeout     = "5"
    zfs_vdev_aggregation_limit= "131072"
    zfs_vdev_async_read_max_active= "3"
    zfs_vdev_async_read_min_active= "1"
    zfs_vdev_async_write_active_max_dirty_percent= "60"
    zfs_vdev_async_write_active_min_dirty_percent= "30"
    zfs_vdev_async_write_max_active= "10"
    zfs_vdev_async_write_min_active= "2"
    zfs_vdev_cache_bshift= "16"
    zfs_vdev_cache_max  = "16384"
    zfs_vdev_cache_size = "0"
    zfs_vdev_max_active = "1000"
    zfs_vdev_mirror_non_rotating_inc= "0"
    zfs_vdev_mirror_non_rotating_seek_inc= "1"
    zfs_vdev_mirror_rotating_inc= "0"
    zfs_vdev_mirror_rotating_seek_inc= "5"
    zfs_vdev_mirror_rotating_seek_offset= "1048576"
    zfs_vdev_queue_depth_pct= "1000"
    zfs_vdev_raidz_impl = "[fastest] original scalar sse2 ssse3 avx2 "
    zfs_vdev_read_gap_limit= "32768"
    zfs_vdev_scheduler  = "noop"
    zfs_vdev_scrub_max_active= "2"
    zfs_vdev_scrub_min_active= "1"
    zfs_vdev_sync_read_max_active= "10"
    zfs_vdev_sync_read_min_active= "10"
    zfs_vdev_sync_write_max_active= "10"
    zfs_vdev_sync_write_min_active= "10"
    zfs_vdev_write_gap_limit= "4096"
    zfs_zevent_cols     = "80"
    zfs_zevent_console  = "0"
    zfs_zevent_len_max  = "4096"
    zfs_zil_clean_taskq_maxalloc= "1048576"
    zfs_zil_clean_taskq_minalloc= "1024"
    zfs_zil_clean_taskq_nthr_pct= "100"
    zil_replay_disable  = "0"
    zil_slog_bulk       = "786432"
    zio_delay_max       = "30000"
    zio_dva_throttle_enabled= "1"
    zio_requeue_io_start_cut_in_line= "1"
    zio_taskq_batch_pct = "75"
    zvol_inhibit_dev    = "0"
    zvol_major          = "230"
    zvol_max_discard_blocks= "16384"
    zvol_prefetch_bytes = "131072"
    zvol_request_sync   = "0"
    zvol_threads        = "32"
    zvol_volmode        = "1"

  Sections:
    .bss                = "0xffffffffc0e76400"
    .data               = "0xffffffffc0e73060"
    .exit.text          = "0xffffffffc0e337c4"
    .gnu.linkonce.this_module= "0xffffffffc0e760c0"
    .init.text          = "0xffffffffc1037000"
    .note.gnu.build-id  = "0xffffffffc0e34000"
    .parainstructions   = "0xffffffffc0e64870"
    .rodata             = "0xffffffffc0e36200"
    .rodata.str1.1      = "0xffffffffc0e48481"
    .rodata.str1.8      = "0xffffffffc0e4bf60"
    .smp_locks          = "0xffffffffc0e64210"
    .strtab             = "0xffffffffc10672d8"
    .symtab             = "0xffffffffc1039000"
    .text               = "0xffffffffc0cf5000"
    .text.unlikely      = "0xffffffffc0e336a8"
    __bug_table         = "0xffffffffc0e7608c"
    __jump_table        = "0xffffffffc0e73000"
    __ksymtab           = "0xffffffffc0e34030"
    __ksymtab_strings   = "0xffffffffc0e702e0"
    __mcount_loc        = "0xffffffffc0e68fe0"
    __param             = "0xffffffffc0e675f0"

@behlendorf I think this issue is quite serious. What do you think, it's a bug in the upstream ZFS or something specific to Ubuntu?

Following up on earlier my post, this is what top shows me this morning. I'm about to do a memory upgrade to see if that helps overall.

top - 13:28:11 up 1 day,  1:40,  1 user,  load average: 69.20, 69.92, 70.11
Tasks: 804 total,   7 running, 797 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 28.3 sy,  0.0 ni, 68.6 id,  2.4 wa,  0.0 hi,  0.7 si,  0.0 st
MiB Mem :  64318.5 total,   6364.4 free,  39344.2 used,  18609.9 buff/cache
MiB Swap:   8192.0 total,   8066.5 free,    125.5 used.  24313.7 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   3008 root      39  19       0      0      0 R  90.4   0.0 731:55.04 zthr_procedure
3810665 root      20   0       0      0      0 S  22.8   0.0   0:03.09 arc_prune
3808172 root      20   0       0      0      0 S  22.5   0.0   0:04.55 arc_prune
3813510 root      20   0       0      0      0 R  22.5   0.0   0:02.25 arc_prune
3817503 root      20   0       0      0      0 S  22.5   0.0   0:01.08 arc_prune
3815612 root      20   0       0      0      0 S  22.2   0.0   0:01.66 arc_prune
3815717 root      20   0       0      0      0 S  22.2   0.0   0:01.59 arc_prune
3816671 root      20   0       0      0      0 S  22.2   0.0   0:01.31 arc_prune
3817443 root      20   0       0      0      0 S  22.2   0.0   0:01.13 arc_prune
3817444 root      20   0       0      0      0 R  22.2   0.0   0:01.12 arc_prune
3817448 root      20   0       0      0      0 S  22.2   0.0   0:01.11 arc_prune
3818175 root      20   0       0      0      0 S  22.2   0.0   0:00.85 arc_prune
   3007 root      20   0       0      0      0 S  21.9   0.0 395:31.78 arc_prune
3814354 root      20   0       0      0      0 R  21.9   0.0   0:02.04 arc_prune
3815635 root      20   0       0      0      0 S  21.9   0.0   0:01.60 arc_prune
3815644 root      20   0       0      0      0 S  21.9   0.0   0:01.59 arc_prune
3815699 root      20   0       0      0      0 S  21.9   0.0   0:01.57 arc_prune
3817330 root      20   0       0      0      0 R  21.9   0.0   0:01.16 arc_prune
3818860 root      20   0       0      0      0 S  21.9   0.0   0:00.69 arc_prune
3818058 root      20   0       0      0      0 S  21.5   0.0   0:00.88 arc_prune
3818200 root      20   0       0      0      0 S  21.5   0.0   0:00.82 arc_prune
3818216 root      20   0       0      0      0 S  21.5   0.0   0:00.81 arc_prune
3808170 root      20   0       0      0      0 S  21.2   0.0   0:04.62 arc_prune
3819075 root      20   0       0      0      0 S  20.9   0.0   0:00.63 arc_prune
3819239 root      20   0       0      0      0 S  19.5   0.0   0:00.59 arc_prune
3819548 root      20   0       0      0      0 S  17.9   0.0   0:00.54 arc_prune
3819557 root      20   0       0      0      0 R  17.2   0.0   0:00.52 arc_prune
3820006 root      20   0       0      0      0 S  15.9   0.0   0:00.48 arc_prune
3820004 root      20   0       0      0      0 S  15.2   0.0   0:00.46 arc_prune
3820007 root      20   0       0      0      0 S  15.2   0.0   0:00.46 arc_prune
3820008 root      20   0       0      0      0 S  15.2   0.0   0:00.46 arc_prune
3819996 root      20   0       0      0      0 S  14.9   0.0   0:00.45 arc_prune

arc_summary.txt
arcstats.txt
zpool_status.txt

/sys/module/zfs/parameters/zfs_arc_dnode_limit_percent:10

From the stats you provided, it seems your dnode_size (>8G) is way bigger than the allowed dnode_limit (<3G).

You can set zfs_arc_dnode_limit_percent to 50, but it would work only on module load time.

Please try to manually set a reasonable absolute upper limit to dnode_limit by using the zfs_arc_dnode_limit tunable.
For example: echo $((12*1024*1024*1024)) > /sys/module/zfs/parameters/zfs_arc_dnode_limit to set a 12G limit.

Does it help?

arc_summary.txt
arcstats.txt
zpool_status.txt

In your case, it seems that arc_meta_limit is constantly crossed due to very big L2ARC headers, which are counted against the metadata limit. After all, you are addressing ~1TB of L2ARC while having a much smaller (32G) ARC. Are you using 4K/8K recordsize? If so, I suggest you to limit your L2ARC to 512 GB max or increasing your ARC (either adding memory, or letting ARC to reserve more memory for itself).

Side node: large L2ARC caches can be successfully address by small ARC, but you need to use a reasonably large recordsize (the defaut value of 128K is very good at limiting L2ARC headers size). For example (taken from one of my virtualization hosts):

L2 ARC Size: (Adaptive)                         269.36  GiB
        Compressed:                     60.51%  163.00  GiB
        Header Size:                    0.05%   148.89  MiB

As you can see, by the virtue of using the default 128K recordsize, a ~280 GB L2ARC can be addressed by using less then 150 MB for its headers. Moreover, a larger recordsize can significantly boost your compression rate (as you can see above).

During a similar scenario on FreeBSD (though no l2arc) my 64-way EPYC hit a load of >700 and became essentially unusable for about an hour during an rsync (many small mail files) between two datasets in the same pool.

Even after adjusting limits and staying well within them, arc_prune regularly spikes for short moments.

Was this page helpful?
0 / 5 - 0 ratings