Etcher: Infer the optimal block size to use

Created on 6 Oct 2016 · 13Comments · Source: balena-io/etcher

(In general, we want to get every last drop of speed out of Etcher, both for saving user time, and for differentiating Etcher from other solutions. So if you have other ideas that could improve Etcher's speed, either requiring previous work by the publisher or not, please do submit them as feature ideas).

Block size affects the speed of writing to a flash drive, and which one is the correct one is not obvious or easy to calculate. It is, however, possible for Etcher to actually alter the block size it uses as it writes. It would, therefore, be possible to use an algorithm that experiments with different block sizes, incrementally converging on the optimum. If done correctly, the experiments should be completed early in the writing process, paying for whatever delay they added and a lot more during the course of the write.

Here is some information by others on the general problem: http://stackoverflow.com/questions/6161823/dd-how-to-calculate-optimal-blocksize

sdk all feature

Source

alexandrosm

Most helpful comment

Interesting... I'd always assumed that as long as the block size written by dd was a multiple of the disk's block size, then you'd get 'optimal' performance, and that's why the recommended blocksize to use with dd is normally 4M.
But if experimentation shows otherwise, then this sounds like a great idea! One small caveat is that ISTR reading that some SD cards actually have (internally) a different block structure at the start of the disk (where the FAT is typically located) to the rest of the disk (where the 'data' is typically located), for wear-levelling / performance reasons (since in 'typical' SD card usage in a digital camera, the former gets written a lot more than the latter).

lurch on 6 Oct 2016

👍2

All 13 comments

lurch on 6 Oct 2016

👍2

@jhermsmeier What are you thoughts on this?

jviotti on 25 Jan 2017

I've just benchmarked this a bit with a BlockReadStream and a BlockWriteStream (which actually read & write Buffers with a specified block size – unlike core stream's highWaterMark, which only specifies the size of the internal queue buffer).

It seems like it doesn't matter much above a block size of 256KB – which is to be expected, as we're not in control of what the underlying system does. Block sizes between 256KB and 2MB seem to yield the best speeds for the devices and transports I tried; everything above 2MB (tested 4MB & 8MB) just fills up external memory buffers.

Block sizes lower than 256KB just hit the CPU & GC harder as there are a lot more reads, writes, and objects to deal with per second.

If we were doing this on a lower level or with bindings into lower level things, it would probably matter a lot more which block size is chosen, but in that case we would also actually be able to determine the devices physical block size more easily.

In conclusion I think the speeds we can get depend more on how many read/write cycles can happen per second without the CPU bound operations (mostly the JS <-> C++ context switches in this case I'd guess) becoming the bottleneck, and the block sizes depend more on how the kernel handles the writes & it's write buffers (as well as the protocols in between, i.e. USB, SATA, PCIe, etc).

Comparing speeds with dd on the aforementioned devices, it tops at a 4MB block size, but is just as fast as the BlockStreams in node (albeit with considerably less memory consumption) and has pretty similar CPU usage (which makes sense, as most time is spent in the kernel).

jhermsmeier on 25 Jan 2017

👍1

To really figure this out, one could set up a rig with all sorts of operating systems and devices connected with various protocols / wires / buses and benchmark the hell out of them – but at that point I think we'd be over-optimizing a wee bit (as it probably depends more on the devices being written to anyways), and get more speed gain by utilising compression, sparse image formats and/or block maps.

jhermsmeier on 25 Jan 2017

Hah, I was going to suggest: is it worth benchmarking on Windows, Mac and Linux to see if they all behave similarly? (i.e. if the same block size is 'optimal' for all 3)

Fundamentally the bottleneck will always be the SD card write speed, and there's obviously nothing we can do to improve that.

lurch on 25 Jan 2017

Yeah, sounds like too much effort for very little gains. Maybe worth to benchmark in other OSes just in case, but I feel that should close this ticket for now. What do you think @jhermsmeier ?

jviotti on 25 Jan 2017

Maybe worth to benchmark in other OSes just in case

Yup, that's definitely a good idea – just to make sure we don't use a block size that makes things extremely slow on some other OS

but I feel that should close this ticket for now.

I concur, we can always reopen it once we're at the point where it starts to make a difference

jhermsmeier on 25 Jan 2017

OK, cool. I'll close this issue then, and we can re-open if you find anything interesting in other OSes.

jviotti on 25 Jan 2017

If anyone else is interested in detecting the physical sector size of a block device from Node.js, you can use https://github.com/ronomon/direct-io

jorangreef on 27 Sep 2017

you can use https://github.com/ronomon/direct-io

Oh, that's very interesting @jorangreef . "Direct Memory Access bypasses the filesystem cache" - perhaps this would help us to fix #1523

lurch on 29 Sep 2017

I stil think this is a valid thing to do. Can someone explain to me the
reasoning of why we closed it?

Alexandros Marinos

Founder & CEO, Resin.io

+1 206-637-5498

@alexandrosm

On Wed, Sep 27, 2017 at 12:56 AM, Joran Dirk Greef <[email protected]

wrote:

If anyone else is interested in detecting the physical sector size of a
block device from Node.js, you can use https://github.com/ronomon/
direct-io

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/resin-io/etcher/issues/751#issuecomment-332440643,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABLUCFQT6trUxCUJIK1kdzpdkyYyUwwTks5smf-QgaJpZM4KPZsw
.

alexandrosm on 30 Sep 2017

@alexandrosm In a previous comment @jhermsmeier said "It seems like it doesn't matter much above a block size of 256KB" ? I _suspect_ that a lot of the previous discussions about the optimal dd blocksize (like the one you linked to earlier) hark back to a time when memory was more scarce than it typically is now?
But alternatively, perhaps "Hard Etcher" is an ideal platform to do the kind of rigorous benchmarking that would be needed for this?

lurch on 3 Oct 2017

@lurch I suppose the question is -- how is that determination made, and
what does "much" mean in this context?

Alexandros Marinos

Founder & CEO, Resin.io

+1 206-637-5498

@alexandrosm

On Tue, Oct 3, 2017 at 1:59 AM, Andrew Scheller notifications@github.com
wrote:

@alexandrosm https://github.com/alexandrosm In a previous comment
@jhermsmeier https://github.com/jhermsmeier said "It seems like it
doesn't matter much above a block size of 256KB" ? I suspect that a lot
of the previous discussions about the optimal dd blocksize (like the one
you linked to earlier) hark back to a time when memory was more scarce than
it typically is now?
But alternatively, perhaps "Hard Etcher" is an ideal platform to do the
kind of rigorous benchmarking that would be needed for this?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/resin-io/etcher/issues/751#issuecomment-333781798,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABLUCO3AosmD3PSWpo4Fnu-OVW_QgEXGks5sofdmgaJpZM4KPZsw
.