Due Friday, April 10 at 5:00 PM. Please refer to the homework policy here.
count-min-sketch data structure allows us to estimate the relative frequency of each element up to an -additive factor with probability of error with space. The original motivation, however was to also obtain a list of -heavy hitters. Explain how to adjust the
count-min-sketch to also maintain a collection of all of the -heavy hitters while holding onto at most items from the stream at any point in time, with probability of error . Here your algorithm may also collect a few extra items with relative frequency in the range .
In this exercise, we develop a refined analysis that can reduce the additive error substantially in many real settings. Let denote the sum of frequence counts of all elements that are not -heavy hitters:
Note that , and might be much less than when the stream is dominated by heavy hitters.
Show that, by adjusting
count-min-sketch(,) slightly by increasing to , the additive error for every element is at most with probability of error .