Advantages of ReactJS
16 December 2023Demystifying Digital Transformation – A Comprehensive Guide
31 December 2023Tags
Published by
BluePi
Data-Driven Business Transformation
What is HyperLogLog?
HyperLogLog in Redis
Redis HyperLogLog function is like the sets in Redis, stored as strings in the storage. The HyperLogLog data set can be serialized and deserialized using the ‘Get and Set’ functions of Redis. Redis HyperLogLog data structure computes the distinct counts in a set using a fixed amount of memory and constant complexity with a trade-off that the count has an error of less than 1%. The memory usage of computing distinct counts using HyperLogLog is less than 12kb. Redis HyperLogLog commands allow you to add elements, count elements and even merge/intersect the sets to get a combined result of two sets. Though intersection is not readily available as a direct command, one can compute the intersection by using different HyperLogLog data structures. Redis HLL have two representations having different memory usage, Sparse and Dense. Sparse representation is used by smaller sets using less memory whereas Dense representation is used for larger sets using up to 12kb of memory. This encoding can be altered using Redis HLL configuration parameters.
Redis HyperLogLog Data Manipulation
Redis HyperLogLog allows set manipulation and set operations using the following commands:
PFADD
PFADD allows one to add elements to the HLL data structure. The addition of an element to the HLL data structure results in recomputing of cardinality of the set and retiring 1 if any change or new data is observed. E.g. – PFADD article users a b c
PFCOUNT
PFCOUNT is used for computing the approximate cardinality of the HLL structure giving the number of elements in the HLL set with an error rate below 1% and using a maximum of 12kb of memory. This command can be used with multiple keys which results in the combined cardinality of all the data structures by merging them on the fly.
PFMERGE
PFMERGE is used for computing the combined cardinality of multiple HLL sets and then write the merged data to another set.
PiStats- Our Use Case
PiStats is our media focussed product suite comprising Analytics and Personalization, CMS, DAM, Mobile and Web Apps. We use HyperLogLog to calculate distinct users on the website across various dimensions.
Example of Use Case
Using the daily visitors list to get the weekly unique visitors count This can be achieved using the PFMERGE command of the HLL data structures wherein the required daily visitor’s sets are merged together to get merged visitor set.
Getting the unique visitors using a specific device and OS
This can be achieved by intersecting the two sets having the device and os unique visitors count. The intersection operation is not readily available in Redis HLL but can be achieved by maintaining a separate key for the intersection of the two dimensions. Also, it can be achieved using Venn Diagram approach of set theory, i.e., Cardinality(device) + Cardinality(os) – Cardinality(device ‘union’ os) = Cardinality(device ‘intersection’ os) We sincerely hope that you enjoyed reading this blog post. If you have any queries or suggestions, please reach out to us at enquiry@bluepi.in, we would be happy to answer! Till then, keep innovating!
About the Author
Published by
Ayushi Kaushik
Associate Technical Lead
An Associate Technical Lead is a key player in our innovation journey, guiding technical teams with expertise and vision. They bridge the gap between strategic goals and technical implementation, ensuring projects are executed seamlessly. As a hands-on leader, they mentor and inspire, fostering a culture of collaboration and excellence