Guava: [documentation] - Any rough guidance on memory usage of the bloom filter?

Created on 7 Jul 2016  路  5Comments  路  Source: google/guava

Hello,
I tried to poke around on the documentation for rough guidance on the effect of (size,fpp) on memory usage on the bloom filter, but couldn't find anything. Two main questions:

  • is the allocation fixed? I assumed that, but when testing with larger and larger number (at 1 x 10^9, 0.01 fpp now), I didn't see it budge - which is good! But made me worried it's not an upfront, fixed allocation. Would be nice to have this confirmed as a design choice (i.e. won't change in the future)
  • any guidance? Like I said, I'm up to 1 billion & 1% fp rate - wondering what the impact of fp rate is now...

Mainly it seems to work so good, I can't get noticeable numbers on it :)
But this will be in a library, incorporated into a long-running process with lots and lots of requests, so it would be good to know, before I crash something with an OOME one day.

I did notice that, based on some other issues filed, it does look like there are basically two types of allocations:

  • one up-front allocation for the filter
  • allocations of temporary objects that should be collected, and not grow proportionally to the filter size

Mainly I'm wondering if this is a conscious choice or something that will likely change.

Most helpful comment

So, some rough idea of the space taken:

** FP Rate: 10.0% (0.1)
    **Expected size: 1,000
        Space allocated: 0.001 MB, (600 bytes)
        Assuming well-behaved clients, max number of FPs: 100.0
    **Expected size: 1,000,000
        Space allocated: 0.571 MB, (599,072 bytes)
        Assuming well-behaved clients, max number of FPs: 100000.0
    **Expected size: 1,000,000,000
        Space allocated: 571.314 MB, (599,066,152 bytes)
        Assuming well-behaved clients, max number of FPs: 1.0E8
    **Expected size: 10,000,000,000
        Space allocated: 5713.140 MB, (5,990,661,488 bytes)
        Assuming well-behaved clients, max number of FPs: 1.0E9
    **Expected size: 100,000,000,000
        Bloom Filter cannot handle these settings.
** FP Rate: 1.0% (0.01)
    **Expected size: 1,000
        Space allocated: 0.001 MB, (1,200 bytes)
        Assuming well-behaved clients, max number of FPs: 10.0
    **Expected size: 1,000,000
        Space allocated: 1.143 MB, (1,198,136 bytes)
        Assuming well-behaved clients, max number of FPs: 10000.0
    **Expected size: 1,000,000,000
        Space allocated: 1142.628 MB, (1,198,132,304 bytes)
        Assuming well-behaved clients, max number of FPs: 1.0E7
    **Expected size: 10,000,000,000
        Space allocated: 11426.280 MB, (11,981,322,976 bytes)
        Assuming well-behaved clients, max number of FPs: 1.0E8
    **Expected size: 100,000,000,000
        Bloom Filter cannot handle these settings.
** FP Rate: 0.1% (0.001)
    **Expected size: 1,000
        Space allocated: 0.002 MB, (1,800 bytes)
        Assuming well-behaved clients, max number of FPs: 1.0
    **Expected size: 1,000,000
        Space allocated: 1.714 MB, (1,797,200 bytes)
        Assuming well-behaved clients, max number of FPs: 1000.0
    **Expected size: 1,000,000,000
        Space allocated: 1713.942 MB, (1,797,198,448 bytes)
        Assuming well-behaved clients, max number of FPs: 1000000.0
    **Expected size: 10,000,000,000
        Bloom Filter cannot handle these settings.
    **Expected size: 100,000,000,000
        Bloom Filter cannot handle these settings.
** FP Rate: 0.01% (1.0E-4)
    **Expected size: 1,000
        Space allocated: 0.002 MB, (2,400 bytes)
        Assuming well-behaved clients, max number of FPs: 0.1
    **Expected size: 1,000,000
        Space allocated: 2.285 MB, (2,396,272 bytes)
        Assuming well-behaved clients, max number of FPs: 100.0
    **Expected size: 1,000,000,000
        Space allocated: 2285.256 MB, (2,396,264,600 bytes)
        Assuming well-behaved clients, max number of FPs: 100000.0
    **Expected size: 10,000,000,000
        Bloom Filter cannot handle these settings.
    **Expected size: 100,000,000,000
        Bloom Filter cannot handle these settings.
** FP Rate: 0.001% (1.0E-5)
    **Expected size: 1,000
        Space allocated: 0.003 MB, (3,000 bytes)
        Assuming well-behaved clients, max number of FPs: 0.01
    **Expected size: 1,000,000
        Space allocated: 2.857 MB, (2,995,336 bytes)
        Assuming well-behaved clients, max number of FPs: 10.0
    **Expected size: 1,000,000,000
        Space allocated: 2856.570 MB, (2,995,330,744 bytes)
        Assuming well-behaved clients, max number of FPs: 10000.0
    **Expected size: 10,000,000,000
        Bloom Filter cannot handle these settings.
    **Expected size: 100,000,000,000
        Bloom Filter cannot handle these settings.

All 5 comments

Just to update, I did a rough review of the code, and I don't see anything re-allocating the BitArray to grow the filter if the FP rate shoots up or something similar. I've don't recall hearing of an implementation guava or elsewhere that does this, so unless there is some feedback that guava may do this in the future, I'll close this out myself in a little bit.

As far as the used memory, it doesn't look like anything tricky is going on here:

https://github.com/google/guava/blob/65e6bd2efb54b9f6a94de0771db698cea8d2a1ca/guava/src/com/google/common/hash/BloomFilter.java#L433

It just implements the formula from wikipedia:

http://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives

Or:

(-n * Math.log(p) / (Math.log(2) * Math.log(2)));

where p is the FP rate, and n is the expected number of items.

Ok, I went ahead and just grabbed the code above and plopped it in this silly program, because I was lazy to calculate estimates:

import com.google.common.math.LongMath;
import com.google.common.primitives.Ints;

import java.math.BigInteger;
import java.math.RoundingMode;
import java.util.Arrays;
import java.util.List;

/**
 * Created by stuart.smith on 7/7/16.
 */
public class CalculateBloomBytes {
    public static void main( String[] args ) {
        List<Long> expectedSize = Arrays.asList(
            1000L, 1000L*1000L, 1000L*1000L*1000L, 10L*1000L*1000L*1000L, 100L*1000L*1000L*1000L
        );
        List<Double> fpRate = Arrays.asList(
            0.1,
            0.01,
            0.001,
            0.0001,
            0.00001
        );

        for( double rate : fpRate ) {
            System.out.println("** FP Rate: " + rate * 100 + "% (" + rate + ")");
            for( long size : expectedSize ) {
                System.out.format("\t**Expected size: %,d\n", size);
                long numBits = (long) (-size * Math.log(rate) / (Math.log(2) * Math.log(2)));
                long longArraySize;
                try {
                    longArraySize = Ints.checkedCast(
                        LongMath.divide(numBits, 64, RoundingMode.CEILING)
                    );
                } catch( IllegalArgumentException e ) {
                    System.out.println("\t\tBloom Filter cannot handle these settings.");
                    continue;
                }
                BigInteger numBytes = new BigInteger(longArraySize+"").multiply(new BigInteger(Long.BYTES+""));
                double numKilobytes = numBytes.doubleValue() / (1024.0 * 1024.0);
                System.out.format("\t\tSpace allocated: %.3f MB, (%,d bytes)\n", numKilobytes, numBytes);
                System.out.println("\t\tAssuming well-behaved clients, max number of FPs: " + rate * size);
            }
        }
    }

Output below

So, some rough idea of the space taken:

** FP Rate: 10.0% (0.1)
    **Expected size: 1,000
        Space allocated: 0.001 MB, (600 bytes)
        Assuming well-behaved clients, max number of FPs: 100.0
    **Expected size: 1,000,000
        Space allocated: 0.571 MB, (599,072 bytes)
        Assuming well-behaved clients, max number of FPs: 100000.0
    **Expected size: 1,000,000,000
        Space allocated: 571.314 MB, (599,066,152 bytes)
        Assuming well-behaved clients, max number of FPs: 1.0E8
    **Expected size: 10,000,000,000
        Space allocated: 5713.140 MB, (5,990,661,488 bytes)
        Assuming well-behaved clients, max number of FPs: 1.0E9
    **Expected size: 100,000,000,000
        Bloom Filter cannot handle these settings.
** FP Rate: 1.0% (0.01)
    **Expected size: 1,000
        Space allocated: 0.001 MB, (1,200 bytes)
        Assuming well-behaved clients, max number of FPs: 10.0
    **Expected size: 1,000,000
        Space allocated: 1.143 MB, (1,198,136 bytes)
        Assuming well-behaved clients, max number of FPs: 10000.0
    **Expected size: 1,000,000,000
        Space allocated: 1142.628 MB, (1,198,132,304 bytes)
        Assuming well-behaved clients, max number of FPs: 1.0E7
    **Expected size: 10,000,000,000
        Space allocated: 11426.280 MB, (11,981,322,976 bytes)
        Assuming well-behaved clients, max number of FPs: 1.0E8
    **Expected size: 100,000,000,000
        Bloom Filter cannot handle these settings.
** FP Rate: 0.1% (0.001)
    **Expected size: 1,000
        Space allocated: 0.002 MB, (1,800 bytes)
        Assuming well-behaved clients, max number of FPs: 1.0
    **Expected size: 1,000,000
        Space allocated: 1.714 MB, (1,797,200 bytes)
        Assuming well-behaved clients, max number of FPs: 1000.0
    **Expected size: 1,000,000,000
        Space allocated: 1713.942 MB, (1,797,198,448 bytes)
        Assuming well-behaved clients, max number of FPs: 1000000.0
    **Expected size: 10,000,000,000
        Bloom Filter cannot handle these settings.
    **Expected size: 100,000,000,000
        Bloom Filter cannot handle these settings.
** FP Rate: 0.01% (1.0E-4)
    **Expected size: 1,000
        Space allocated: 0.002 MB, (2,400 bytes)
        Assuming well-behaved clients, max number of FPs: 0.1
    **Expected size: 1,000,000
        Space allocated: 2.285 MB, (2,396,272 bytes)
        Assuming well-behaved clients, max number of FPs: 100.0
    **Expected size: 1,000,000,000
        Space allocated: 2285.256 MB, (2,396,264,600 bytes)
        Assuming well-behaved clients, max number of FPs: 100000.0
    **Expected size: 10,000,000,000
        Bloom Filter cannot handle these settings.
    **Expected size: 100,000,000,000
        Bloom Filter cannot handle these settings.
** FP Rate: 0.001% (1.0E-5)
    **Expected size: 1,000
        Space allocated: 0.003 MB, (3,000 bytes)
        Assuming well-behaved clients, max number of FPs: 0.01
    **Expected size: 1,000,000
        Space allocated: 2.857 MB, (2,995,336 bytes)
        Assuming well-behaved clients, max number of FPs: 10.0
    **Expected size: 1,000,000,000
        Space allocated: 2856.570 MB, (2,995,330,744 bytes)
        Assuming well-behaved clients, max number of FPs: 10000.0
    **Expected size: 10,000,000,000
        Bloom Filter cannot handle these settings.
    **Expected size: 100,000,000,000
        Bloom Filter cannot handle these settings.

Closed it out - although it would be nice to move this into the documentation proper.

Oh, and a graph :)

Was this page helpful?
0 / 5 - 0 ratings