Redis’s HyperLogLog actual combat

  redis

Order

This article mainly studies the use of redis’ HyperLogLog

Relevant command

pfadd

The complexity of each added element is O(1)

127.0.0.1:6379> pfadd uv0907 uid1 uid2 uid3
(integer) 1
  • Add an element to the HyperLogLog and return 1 if there is an internal change, but not 0

pfcount

The complexity is O(1) when a single hyperlog is scoped, and O(N) when multiple hyperlogs are scoped.

127.0.0.1:6379> pfcount uv0907
(integer) 3
  • Returns the approximate cardinality of the hyperlog, or if multiple hyperlogs are specified, the approximate cardinality of their union is returned.

pfmerge

Complexity is O(N), n is the number of merged HyperLogLog

127.0.0.1:6379> pfadd uv0906 uid1 uid4 uid5
(integer) 1
127.0.0.1:6379> pfmerge uv0607 uv0906 uv0907
OK
127.0.0.1:6379> pfcount uv0607
(integer) 5
  • Merges the specified Hyperlog into the new Hyperlog

the usage scenarios

Hyperlog is a kind of Probabilistic data Structures. The basic idea of this kind of data structures is to use statistical probability algorithms to sacrifice the accuracy of data to save memory space and improve the performance of related operations. The most typical usage scenario is to count the daily UV of the website. Examples are as follows:

    @Test
    public void testUv(){
        String uv1 = "uv96";
        String uv2 = "uv97";
        IntStream.rangeClosed(1,100)
                .forEach(i -> {
                    System.out.println(i);
                    redisTemplate.opsForHyperLogLog()
                            .add(uv1,"user"+i);
                    redisTemplate.opsForHyperLogLog()
                            .add(uv2,"user"+i/2);
                });

        long uv1Count = redisTemplate.opsForHyperLogLog().size(uv1);
        System.out.println(uv1Count);
        long uv2Count = redisTemplate.opsForHyperLogLog().size(uv2);
        System.out.println(uv2Count);

        String uv1uv2 = "uv67";
        Long uv1uv2Count = redisTemplate.opsForHyperLogLog().union(uv1uv2,uv1,uv2);
        System.out.println(uv1uv2Count);
        Long realCount = redisTemplate.opsForHyperLogLog().size(uv1uv2);
        System.out.println(realCount);
    }

Summary

  • Redis’s Hyperlog is especially suitable for unique statistics of massive data, which requires memory occupation and can also accept certain error rates.
  • As the union operation is O(N), slow query should be paid attention to at the mass data level.

doc