Linux: Writing to disk (USB or sdcard) stalls other tasks

Created on 7 Dec 2016  路  19Comments  路  Source: raspberrypi/linux

Have spent quite some time narrowing this down now, but don't know where to look further.

Background is, I'm using a system that receives a 500kbyte/s h264 video stream via wifi cards in monitor mode. This video stream is then displayed on the Pi and should also be written to an USB memory stick or the internal sd card.

However, the problem is, it's just not possible to record this videostream (about 500kbyte/s) without the video sometimes stuttering. This always happens when there is data being written to a disk (USB stick or internal sdcard doesn't matter).

Have tried now:

  • different Raspbian images, kernel versions, raspberry firmware versions
  • deadline i/o scheduler, cfq, noop, bfq, 1000Hz timer, different preemption options
  • niced/ioniced the tasks
  • different sdcards and usb sticks
  • different filesystems
  • verified that usb and sdcard write speed is good
  • made sure no other background tasks like log rotate or whatever are running
  • tried sync and async mounting options
  • tried all kinds of different vm_dirty_background etc. values to smooth out the writing
  • Some things I probably forgot right now
  • The system is not loaded much, cpu usage is around 10%, no high iowait values etc.

What works fine is: Displaying and writing the same data stream to:

  • A filesystem in RAM (/tmp tmpfs for example)
  • A smartphone connected via USB tethering (usb0 "virtual" network interface)
  • Another computer connected via ethernet (eth0 internal Pi ethernet interface)
  • Another Raspberry via another Ralink wifi card in monitor mode

This looks to me like pdflush (I think that's what it was called some years ago, sorry, my Linux knowledge may not be up-to-date) somehow has too high priority when writing and stalls things out? Or some other disk-access related stuff in the kernel? Or maybe something in the underlying Raspberry firmware?

How could I narrow this down further?

Waiting for external input

Most helpful comment

What's happening in short is that the kernel is being very stupid in how it's handling write-back buffering. This isn't unique to the Raspberry Pi kernel, although the slower storage hardware tends to make problems like this much worse. It's a known issue upstream, but there really hasn't been that much work on a fix (Linux himself proposed something to solve the issues this causes for machines with lots (16GB+) or RAM, but it won't help at all on systems like the Pi which have very little RAM and insanely slow storage, and I don't even know if it got committed).

The best solution short term is going to be either getting faster storage, or reducing the amount of data you need to write out, and then setting:

vm.dirty_background_ratio = 1
vm.dirty_background_bytes = 5
vm.dirty_writeback_centisecs = 25

(you might want to adjust the values a bit, but try to keep them low, especially the first one).

More detailed explanation:
Linux, like most every other modern operating system, buffers data in RAM prior to writing out to disk. In most cases, this improves performance because it reduces the number of requests to the disk (One big request to the disk will almost always take less time to complete than a number of smaller ones totaling up to the same amount of data) and makes it easier for the disk controller to schedule writes sanely. This is all well and good, except for a few specific issues Linux has:

  1. Windows, OS X, and pretty much everything that isn't Linux based does the eventual I/O to the disk with background priority so it has less impact on the rest of the system, Linux however does it with regular foreground priority, so it causes contention for other applications trying to use the disk.
  2. Linux uses static default values for how much to cache in memory which result in particularly bad behavior when you have large amounts of RAM or particularly slow storage. These are what those sysctl values you were talking about are, and explaining why they're bad defaults requires explaining what they do.
    vm.dirty_background_ratio: This one controls how much data (as a percentage of system RAM) Linux will cache before trying to do write-back in the 'background'. 10% means about 100MB on a Pi, which is huge given that you maybe get 40MB/s write rates if you're lucky for most storage devices you would likely be using with it.
    vm.dirty_ratio: Sets the max before calls to write() stall waiting for write-back to finish. This needs to be greater than or equal to vm.dirty_background_ratio, otherwise things will behave almost as if there's no write-back buffering, but many things will be _much_ slower.
    vm.dirty_writeback_centisecs: This controls how frequently the 'background' write-back will happen once vm.dirty_background_ratio is exceeded. It's measured in 100th's of a second, which means the default is 5 seconds, which is also insane for pretty much any modern system. The value I suggested above sets it to a quarter of a second, which I find works well on most systems (including the Pi 2 I have).

In general, I think what's happening here is a combination of both issues (the kernel is buffering too much data, so when the writeback hits, it swamps the disk/SD card and nothing can read or write in any reasonable amount of time. By setting the background ratio to a low value, you'll ensure that the background writeback triggers early, and by setting the writeback timer to a low value, you ensure it drains things fast.

All 19 comments

Could you say a little about the data flow within your application, paying attention to the threading and whether I/O is asynchronous or non-blocking?

The application puts out the h264 stream on stdout. Then it's piped into tee that saves the file to disk and pipes the stream further to the display application (modified hello_video.bin to accept data from stdin instead reading from disk).

I also tried an alternate tee version ("ftee") that allows to write to named pipes and doesn't block when there is nothing connected at the other side of the pipe, same problem.

Don't think it's some blocking problem with the pipes or tee, as for example everything continues to run when the disk that the file is being written to becomes full. Just the video stuttering disappears because nothing is being written anymore.

Also, I can decide if I want to have small video stutters very often or very big stutters more seldomly by using different values for those parameters:

vm.dirty_background_ratio = 10
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500

Then I can watch the dirty pages climbing in /proc/vmstat and just after the stutter has occured the dirty pages are low again.

What's happening in short is that the kernel is being very stupid in how it's handling write-back buffering. This isn't unique to the Raspberry Pi kernel, although the slower storage hardware tends to make problems like this much worse. It's a known issue upstream, but there really hasn't been that much work on a fix (Linux himself proposed something to solve the issues this causes for machines with lots (16GB+) or RAM, but it won't help at all on systems like the Pi which have very little RAM and insanely slow storage, and I don't even know if it got committed).

The best solution short term is going to be either getting faster storage, or reducing the amount of data you need to write out, and then setting:

vm.dirty_background_ratio = 1
vm.dirty_background_bytes = 5
vm.dirty_writeback_centisecs = 25

(you might want to adjust the values a bit, but try to keep them low, especially the first one).

More detailed explanation:
Linux, like most every other modern operating system, buffers data in RAM prior to writing out to disk. In most cases, this improves performance because it reduces the number of requests to the disk (One big request to the disk will almost always take less time to complete than a number of smaller ones totaling up to the same amount of data) and makes it easier for the disk controller to schedule writes sanely. This is all well and good, except for a few specific issues Linux has:

  1. Windows, OS X, and pretty much everything that isn't Linux based does the eventual I/O to the disk with background priority so it has less impact on the rest of the system, Linux however does it with regular foreground priority, so it causes contention for other applications trying to use the disk.
  2. Linux uses static default values for how much to cache in memory which result in particularly bad behavior when you have large amounts of RAM or particularly slow storage. These are what those sysctl values you were talking about are, and explaining why they're bad defaults requires explaining what they do.
    vm.dirty_background_ratio: This one controls how much data (as a percentage of system RAM) Linux will cache before trying to do write-back in the 'background'. 10% means about 100MB on a Pi, which is huge given that you maybe get 40MB/s write rates if you're lucky for most storage devices you would likely be using with it.
    vm.dirty_ratio: Sets the max before calls to write() stall waiting for write-back to finish. This needs to be greater than or equal to vm.dirty_background_ratio, otherwise things will behave almost as if there's no write-back buffering, but many things will be _much_ slower.
    vm.dirty_writeback_centisecs: This controls how frequently the 'background' write-back will happen once vm.dirty_background_ratio is exceeded. It's measured in 100th's of a second, which means the default is 5 seconds, which is also insane for pretty much any modern system. The value I suggested above sets it to a quarter of a second, which I find works well on most systems (including the Pi 2 I have).

In general, I think what's happening here is a combination of both issues (the kernel is buffering too much data, so when the writeback hits, it swamps the disk/SD card and nothing can read or write in any reasonable amount of time. By setting the background ratio to a low value, you'll ensure that the background writeback triggers early, and by setting the writeback timer to a low value, you ensure it drains things fast.

The question is, why does it stall other tasks that actually do not access the disk at all? I know the effects you are describing from linux, you get a laggy shell for example, because the shell cannot read in the current directory from the disk until everything is flushed from the cache.

But I'm receiving a video stream from wifi cards and decoding/displaying it, no disk involved there.

Fiddling with the dirty ratios etc. and nicing/ionicing the tasks gave a small improvement, but doesn't really solve the problem.

My guess would be something related to the SD and USB controller drivers. I've seen cases like this before where writing to storage causes stalls for stuff that's just doing processing or operating on memory on a Pi.

What can I do to narrow it down so that it can be fixed?

Which MMC host driver did you try?

Are there different alternatives for the Pi? But considering that this also happens when writing to a USB memory stick, the problem is probably not related to drivers/hardware access (?)

This is what dmesg says about mmc:

[    1.304894] Waiting for root device /dev/mmcblk0p2...
[    1.333736] mmc0: host does not support reading read-only switch, assuming write-enable
[    1.335662] mmc0: new high speed SDHC card at address 59b4
[    1.336203] mmcblk0: mmc0:59b4 USD   7.48 GiB
[    1.337125]  mmcblk0: p1 p2 p3
[    1.356331] mmc1: queuing unknown CIS tuple 0x80 (2 bytes)
[    1.357837] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[    1.359339] mmc1: queuing unknown CIS tuple 0x80 (3 bytes)
[    1.362039] mmc1: queuing unknown CIS tuple 0x80 (7 bytes)
[    1.407621] EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem
[    1.407630] EXT4-fs (mmcblk0p2): write access will be enabled during recovery
[    1.432801] Indeed it is in host mode hprt0 = 00021501
[    1.448301] mmc1: new high speed SDIO card at address 0001
[    1.556271] EXT4-fs (mmcblk0p2): recovery complete
[    1.566196] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)

Depending on the kernel version the following host driver are available:
bcm2835-sdhost.c
bcm2835-mmc.c
sdhci-bcm2835.c
sdhci-iproc.c

Is correct that your using a Raspberry Pi 3 with onboard Wifi via SDIO?

Ah, overlooked something.

It's this one:

[    1.212364] mmc0: sdhost-bcm2835 loaded - DMA enabled (>1)
[    1.214391] mmc-bcm2835 3f300000.mmc: mmc_debug:0 mmc_debug2:0
[    1.214396] mmc-bcm2835 3f300000.mmc: DMA channel allocated

Yes, it's a Pi3 with onboard Wifi via SDIO. Kernel and Firmware is the one from the latest Raspbian release image. Kernel 4.4.32. But the problem also occurs with a Pi2 or with the brcmfmac module not loaded on the Pi3.

If you want to eliminate driver differences and can manage without WiFi, add dtoverlay=mmc to config.txt and reboot.

Tried the dtoverlay=mmc parameter, made no difference.

Also verified it's not some pipe blocking problem. Let the video reception and display run (without saving the stream to disk), then opened another terminal and typed "cat /dev/zero > /root/testfile". Video freezes completely for a few seconds periodically.

I'm not sure if this is the same problem, but I noticed a sort of disk stalling when backing up root via rsync onto USB drives. Rsync will function without problem until stalling for a couple seconds - during which CPU usage will drop very low and the USB activity light will stop flashing. Maybe it is a buffered write-back problem however the stall occurs even with backups involving very little disk writing, and it doesn't seem like there is any writing to the USB drive involved at all during periods of idleness.

Tried all kinds of other settings in the meantime, couldn't make the problem go away completely. Fast SD cards/USB sticks and carefully tuned vm.dirty_* parameters help to some extent, but it still stutters a little sometime. Writing to RAMdisk works, that's what I do now. But it limits the recording time to 12 minutes, which is not very nice.

I've also tried a slow, heavily fragmented and almost full 1TB USB2 mechanical harddrive (that has a lower sustained write speed than the SDcards and USB sticks I tried ...), no problems there.

Is it possible to reproduce the stuttering by writing to SD card while playing a big sound file (no video stream at all)?

Having read through, I suspect this is an upstream kernel issue. If you disagree, please comment to prevent the 30 day closure.

This issue will be closed within 30 days unless further interactions are posted. If you wish this issue to remain open, please add a comment. A closed issue may be reopened if requested.

It would be nice if you keep it open, I wanted to give it another try with the Kernel 4.14.50 (and firmware) included in Raspbian 2018-06-27 soon.

(I have reported the issue for Kernel 4.4.x, in the meantime I have tested with 4.9.35, where the issue also occurs).

@rodizio1 Have you any results for tests on the latest kernels?

I disabled file system's journal feature and write system call stalling was fixed, I don't exactly know why. I hope this would help.

Was this page helpful?
0 / 5 - 0 ratings