Private Query Release Assisted by Public Data

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »

Bibtek download is not availble in the pre-proceeding


Authors

Raef Bassily, Albert Cheu, Shay Moran, Aleksandar Nikolov, Jonathan Ullman, Steven Wu

Abstract

We study the problem of differentially private query release assisted by public data. Given a class $H$ of queries $h:X \rightarrow \{-1, 1\}$, and data set of i.i.d. samples from an unknown distribution $D$, a query-release algorithm is required to output a data structure $G: H \rightarrow [-1, 1]$ such that for any query $h\in H,$ $G(h)$ approximates $E_{x\sim D}[h(x)]$ up to some error $\alpha$. In this problem, the input data set consists of two types of samples: private and public. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in terms of private and public sample complexities. First, we show that this task is achievable for any query class of finite VC-dimension using only (roughly) $d/\alpha$ public samples and $\sqrt{p}d^{3/2}/\alpha^2$ private samples, where $d$ is the VC-dimension of the class, and $p$ is the dual VC-dimension. When there are no public samples, there are known examples of classes of VC-dimension one for which this task is impossible under differential privacy (e.g., the class of threshold functions over $R$). Moreover, our upper bound on the public sample complexity is non-trivial since, without private samples, it is known that this task is equivalent to uniform convergence over $H$ which requires at least $d/\alpha^2$ public samples. Next, we give lower bounds on private and public sample complexities with tight dependence on $p$ and $\alpha$. In particular, for the class of decision stumps, we give a lower bound of $\sqrt{p}/\alpha$ on the private sample complexity whenever the number of public samples is $<1/\alpha^2$. Given our upper bound, this shows that the dependence on $\sqrt{p}$ in the private sample complexity is necessary (in the non-trivial regime where the public samples are insufficient to solve the problem on its own). We also give a tight lower bound of $1/\alpha$ on the public sample complexity for a broad family of query classes.