# Homework 16

Due Friday, April 10 at 5:00 PM. Please refer to the homework policy here.

## Problems

1. The count-min-sketch data structure allows us to estimate the relative frequency of each element up to an $\epsilon$-additive factor with probability of error $\leq 1 / \text{poly}({n})$ with $O( \log (n) / \epsilon)$ space. The original motivation, however was to also obtain a list of $\epsilon$-heavy hitters. Explain how to adjust the count-min-sketch to also maintain a collection of all of the $\epsilon$-heavy hitters while holding onto at most $O({1 / \epsilon})$ items from the stream at any point in time, with probability of error $\leq 1/n^2$. Here your algorithm may also collect a few extra items with relative frequency in the range $[\epsilon/2,\epsilon)$.

2. In this exercise, we develop a refined analysis that can reduce the additive error substantially in many real settings. Let $S$ denote the sum of frequence counts of all elements that are not $\epsilon$-heavy hitters:

Note that $S \leq n$, and $S$ might be much less than $n$ when the stream is dominated by heavy hitters. Show that, by adjusting count-min-sketch($\epsilon$,$1/n^2$) slightly by increasing $w$ to $\lceil 4/\epsilon \rceil$, the additive error for every element is at most $\epsilon S$ with probability of error $\leq 1 / n$.