Class BloomSHA1


  • public class BloomSHA1
    extends Object
    A Bloom filter for sets of SHA1 digests. A Bloom filter uses a set of k hash functions to determine set membership. Each hash function produces a value in the range 0..M-1. The filter is of size M. To add a member to the set, apply each function to the new member and set the corresponding bit in the filter. For M very large relative to k, this will normally set k bits in the filter. To check whether x is a member of the set, apply each of the k hash functions to x and check whether the corresponding bits are set in the filter. If any are not set, x is definitely not a member. If all are set, x may be a member. The probability of error (the false positive rate) is f = (1 - e^(-kN/M))^k, where N is the number of set members. This class takes advantage of the fact that SHA1 digests are good- quality pseudo-random numbers. The k hash functions are the values of distinct sets of bits taken from the 20-byte SHA1 hash. The number of bits in the filter, M, is constrained to be a power of 2; M == 2^m. The number of bits in each hash function may not exceed floor(m/k). This class is designed to be thread-safe, but this has not been exhaustively tested.
    Author:
    Jim Dixon BloomSHA1.java and KeySelector.java are BSD licensed from the xlattice app - http://xlattice.sourceforge.net/ minor tweaks by jrandom, exposing unsynchronized access and allowing larger M and K. changes released into the public domain. Note that this is used only by DecayingBloomFilter, which uses only the unsynchronized locked_foo() methods. Deprecated for use outside of the router; to be moved to router.jar. As of 0.8.11, the locked_foo() methods are thread-safe, in that they work, but there is a minor risk of false-negatives if two threads are accessing the same bloom filter integer.
    • Constructor Detail

      • BloomSHA1

        public BloomSHA1​(int m,
                         int k)
        Creates a filter with 2^m bits and k 'hash functions', where each hash function is portion of the 160-bit SHA1 hash.
        Parameters:
        m - determines number of bits in filter
        k - number of hash functionsx See KeySelector for important restriction on max m and k
      • BloomSHA1

        public BloomSHA1​(int m)
        Creates a filter of 2^m bits, with the number of 'hash functions" k defaulting to 8.
        Parameters:
        m - determines size of filter
      • BloomSHA1

        public BloomSHA1()
        Creates a filter of 2^20 bits with k defaulting to 8.
    • Method Detail

      • clear

        public void clear()
        Synchronized version
      • size

        public final int size()
        Returns the number of keys which have been inserted. This class (BloomSHA1) does not guarantee uniqueness in any sense; if the same key is added N times, the number of set members reported will increase by N.
        Returns:
        number of set members
      • capacity

        public final int capacity()
        Returns:
        number of bits in filter
      • insert

        public void insert​(byte[] b)
        Add a key to the set represented by the filter. XXX This version does not maintain 4-bit counters, it is not a counting Bloom filter.
        Parameters:
        b - byte array representing a key (SHA1 digest)
      • insert

        public void insert​(byte[] b,
                           int offset,
                           int len)
      • locked_insert

        public final void locked_insert​(byte[] b)
      • locked_insert

        public final void locked_insert​(byte[] b,
                                        int offset,
                                        int len)
      • locked_member

        public final boolean locked_member​(byte[] b)
      • locked_member

        public final boolean locked_member​(byte[] b,
                                           int offset,
                                           int len)
      • member

        public final boolean member​(byte[] b)
        Is a key in the filter. External interface, internally synchronized.
        Parameters:
        b - byte array representing a key (SHA1 digest)
        Returns:
        true if b is in the filter
      • member

        public final boolean member​(byte[] b,
                                    int offset,
                                    int len)
      • getFilterKey

        public BloomSHA1.FilterKey getFilterKey​(byte[] b,
                                                int offset,
                                                int len)
        Get the bloom filter offsets for reuse. Caller should call release(rv) when done with it.
        Since:
        0.8.11
      • locked_insert

        public void locked_insert​(BloomSHA1.FilterKey fk)
        Add the key to the filter.
        Since:
        0.8.11
      • locked_member

        public boolean locked_member​(BloomSHA1.FilterKey fk)
        Is the key in the filter.
        Since:
        0.8.11
      • falsePositives

        public final double falsePositives​(int n)
        Parameters:
        n - number of set members
        Returns:
        approximate false positive rate
      • falsePositives

        public final double falsePositives()