See https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-26-50/siggraph2015_2D00_mmg_2D00_marius_2D00_notes.pdf for details of quality / performance improvements.
I wrote a quick script to compare similarity of Gaussian blurs of various radii with various numbers of dual Kawase passes. Here are the results:

@pcwalton related to your question on irc - https://searchfox.org/mozilla-central/source/gfx/2d/FilterNodeSoftware.cpp#3099
I did some experimentation and tentatively came up with the following formulas to determine suitable parameters for the dual Kawase blur:
numPasses = max(1, round(4/3 * log(blurRadius)))
distance = pow(0.4538, numPasses) * blurRadius
Experimentally, these parameters create blurs that are within 0.02 (i.e. 2%) SSIM of the reference Gaussian blur.
Note that with dual Kawase blurs, images tend to accumulate blocky artefacts if shrunk down too much. So after 4 downsample operations, I stop allocating smaller framebuffers. (That is, the size of each intermediate framebuffer is always at least 1/16 of the size of the original images.) Another way of thinking about this is it looks best to switch from a dual Kawase blur to a regular Kawase blur after 4 downsample operations, effectively forming a "partial dual Kawase blur".
I don't know if it's helpful but you could maybe take a look at the openCV implémentation
https://docs.opencv.org/3.3.1/d4/d86/group__imgproc__filter.html#gaabe8c836e97159a9193fb0b11ac52cf1
Especially at the CUDA accelerated one (but could be open/sysCLised)
https://docs.opencv.org/2.4/modules/gpu/doc/image_filtering.html
Status update: I've come up with a "dual Gaussian" blur that is a variation on the dual Kawase technique based on Gaussian blur. Essentially I downsample, then upsample repeatedly, applying a 6x6 Gaussian kernel every time I upsample. (Thanks to the bilinear filtering hardware, this only requires 10 taps, which is slightly better than the 13 taps per level of dual Kawase.)
This results in images that are very similar to the full Gaussian blur, but there is an issue: when animating blur, discontinuities are visible when crossing over the threshold at which we introduce a new downsample/upsample pass. I tried to minimize it, but I'm a bit out of ideas at the moment as to how to eliminate it entirely. I'm not sure how much we care—on balance, I'd take making all blurs faster over perfectly smooth animated blurs—but it's a bit of a bummer.
Do you have a video of the transitioning radius that you can share?
I just came up with a way to work around the problem, I think.
On Sat, Aug 25, 2018, 10:22 AM Jeff Muizelaar notifications@github.com
wrote:
Do you have a video of the transitioning radius that you can share?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/servo/webrender/issues/2821#issuecomment-415984240,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJoyRdj6xler7D3mrkezdl7g4OJ9qYIks5uUYfugaJpZM4UrBDt
.
We investigated this and determined it's not a viable technique for what we need, due to performance / quality concerns. @pcwalton can provide more detail if anyone's is interested in the gory details.
Most helpful comment
I did some experimentation and tentatively came up with the following formulas to determine suitable parameters for the dual Kawase blur:
Experimentally, these parameters create blurs that are within 0.02 (i.e. 2%) SSIM of the reference Gaussian blur.
Note that with dual Kawase blurs, images tend to accumulate blocky artefacts if shrunk down too much. So after 4 downsample operations, I stop allocating smaller framebuffers. (That is, the size of each intermediate framebuffer is always at least 1/16 of the size of the original images.) Another way of thinking about this is it looks best to switch from a dual Kawase blur to a regular Kawase blur after 4 downsample operations, effectively forming a "partial dual Kawase blur".