【论文记录】Stochastic gradient descent with differentially private updates

本文主要是介绍【论文记录】Stochastic gradient descent with differentially private updates，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

The sample size required for a target utility level increases with the privacy constraint.
Optimization methods for large data sets must also be scalable.
SGD algorithms satisfy asymptotic guarantees

主要工作简介：
$\quad$ In this paper we derive differentially private versions of single-point SGD and mini-batch SGD, and evaluate them on real and synthetic data sets.
更多运用SGD的原因：
$\quad$ Stochastic gradient descent (SGD) algorithms are simple and satisfy the same asymptotic guarantees as more computationally intensive learning methods.
由于asymptotic guarantees带来的影响：
$\quad$ to obtain reasonable performance on finite data sets practitioners must take care in setting parameters such as the learning rate (step size) for the updates.
上述影响的应对之策：
$\quad$ Grouping updates into “minibatches” to alleviate some of this sensitivity and improve the performance of SGD. This can improve the robustness of the updating at a moderate expense in terms of computation, but also introduces the batch size as a free parameter.

优化目标：
$\quad$ solve a regularized convex optimization problem ： $w^* = \mathop{ \textbf{argmin} } \limits_{ w \in \mathbb{R}^d} \frac{\lambda}{2} \Vert w \Vert ^2 + \frac{1}{n} \mathop{ \Sigma }\limits_{i=1}^n \mathbb{l} (w,x_i,y_i)$
$\quad$ where $w$ is the normal vector to the hyperplane separator, and $\mathbb{l}$ is a convex loss function.
$\quad$ 若 $\mathbb{l}$ 选为 logistic loss, 即 $\mathbb{l} (w,x,y)=log(1+e^{-yw^Tx})$ , 则 $\Rightarrow$ Logistic Regression
$\quad$ 若 $\mathbb{l}$ 选为 hinge loss, 即 $\mathbb{l} (w,x,y)=$ max $0,1-yw^Tx)$ , 则 $\Rightarrow$ SVM

优化算法：
$\quad$ SGD with mini-batch updates ： $w_{t+1} = w_t - \eta_t \Big( \lambda w_t + \frac{1}{b} \mathop{\Sigma}\limits_{ (x_i,y_i) \in B_t} \triangledown \mathbb{l} (w_t,x_i,y_i) \Big)$
$\quad$ where $\eta_t$ is a learning rate, the update at each step $t$ is based on a small subset $B_t$ of examples of size $b$ .

满足差分隐私的 mini-batch SGD ：
$\quad$ A differentially-private version of the mini-batch update ： $w_{t+1} = w_t - \eta_t \Big( \lambda w_t + \frac{1}{b} \mathop{\Sigma}\limits_{ (x_i,y_i) \in B_t} \triangledown \mathbb{l} (w_t,x_i,y_i) \,+ \frac{1}{b}Z_t \Big)$
$\quad$ where $Z_t$ is a random noise vector in $\mathbb R ^d$ drawn independently from the density: $\rho(z) \propto e^{-(\alpha/2) \|z\|}$

使用上式的 mini-batch update 时, 此种updates满足 $\alpha$ -differentially private的条件：
$\quad$ $\mathcal{Theorem \,}$ If the initialization point $w_o$ is chosen independent of the sensitive data, the batches $B_t$ are disjoint, and if $\| \triangledown \mathbb l(w,x,y)\| \leq 1$ for all $w$ , and all $x_i,y_i)$ , then SGD with mini-batch updates is $\alpha$ -differentially private.

实验现象：
$\quad$ batch size 为1时DP-SGD的方差比普通的SGD更大。但 batch size 调大后则方差减小了很多。
由此而总结出的经验：
$\quad$ In terms of objective value, guaranteeing differential privacy can come for “free” using SGD with moderate batch size.
实际上 batch size 带来的影响是先减后增
$\quad$ increasing the batch size improved the performance of private SGD, but there is a limit , much larger batch sizes actually degrade performance.

数据维度 $d$ 与隐私保护参数会影响实验所需的数据量：
$\quad$ Differentially private learning algorithms often have a sample complexity that scales linearly with the data dimension $d$ and inversely with the privacy risk $\alpha$ . Thus a moderate reduction in $\alpha$ or increase in $d$ may require more data.