Double-Parallel Monte Carlo for Bayesian analysis of big data

Xue, Jingnan; Liang, Faming

doi:10.1007/s11222-017-9791-1

Double-Parallel Monte Carlo for Bayesian analysis of big data

Published: 27 November 2017

Volume 29, pages 23–32, (2019)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

714 Accesses
13 Citations
Explore all metrics

Abstract

This paper proposes a simple, practical, and efficient MCMC algorithm for Bayesian analysis of big data. The proposed algorithm suggests to divide the big dataset into some smaller subsets and provides a simple method to aggregate the subset posteriors to approximate the full data posterior. To further speed up computation, the proposed algorithm employs the population stochastic approximation Monte Carlo algorithm, a parallel MCMC algorithm, to simulate from each subset posterior. Since this algorithm consists of two levels of parallel, data parallel and simulation parallel, it is coined as “Double-Parallel Monte Carlo.” The validity of the proposed algorithm is justified mathematically and numerically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big Data Analytics: A Literature Review Paper

Uncertainty in big data analytics: survey, opportunities, and challenges

Article Open access 04 June 2019

Stratified random sampling from streaming and stored data

Article 23 October 2020

References

Andrieu, C., Moulines, E., Priouret, P.: Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. 44, 283–312 (2005). https://doi.org/10.1137/S0363012902417267
Article MathSciNet MATH Google Scholar
Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals. Biometrika 97, 465–480 (2010)
Article MathSciNet MATH Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984). https://doi.org/10.1109/TPAMI.1984.4767596
Article MATH Google Scholar
Hasting, W.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970). https://doi.org/10.1093/biomet/57.1.97
Article MathSciNet Google Scholar
Kass, R.E., Tierney, L., Kadane, J.B.: The validity of posterior expansions based on Laplace’s method. In: Geisser, S., Hodges, J.S., Press, S.J., ZeUnerv, A. (eds.) Bayesian and Likelihood Methods in Statistics and Econometrics: Essays in Honor of George A. Barnard, vol. 7, pp. 473–488. North Holland, Amsterdam (1990)
Google Scholar
Li, C., Srivastava, S., Dunson, D.B.: Simple, scalable and accurate posterior interval estimation. Biometrika 104, 665–680 (2017)
Liang, F.: On the use of stochastic approximation Monte Carlo for Monte Carlo integration. Stat. Probab. Lett. 79, 581–587 (2009)
Article MathSciNet MATH Google Scholar
Liang, F., Liu, C., Carroll, R.J.: Stochastic approximation in Monte Carlo computation. J. Am. Stat. Assoc. 102, 305–320 (2007)
Article MathSciNet MATH Google Scholar
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087 (1953)
Article Google Scholar
Minsker, S., Srivastava, S., Lin, L., Dunson, D.B.: Robust and scalable Bayes via a median of subset posterior measures. ArXiv e-prints (2014)
Neiswanger, W., Wang, C., Xing, E.P.: Asymptotically exact, embarrassingly parallel MCMC, In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, UAI’14, pp. 623–632, AUAI Press, Arlington. http://dl.acm.org/citation.cfm?id=3020751.3020816 (2014)
Ripley, B.D.: Stochastic Simulation. Wiley, New York (1987)
Book MATH Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951). https://doi.org/10.1214/aoms/1177729586
Article MathSciNet MATH Google Scholar
Roe, B.P., Yang, H.-J., Zhu, J., Liu, Y., Stancu, I., McGregor, G.: Boosted decision trees as an alternative to artificial neural networks for particle identification. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 543, 577–584 (2005)
Article Google Scholar
Scott, S.L., Blocker, A.W., Bonassi, F.V.: Bayes and big data: the consensus Monte Carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11, 78–88 (2016)
Google Scholar
Song, Q., Wu, M., Liang, F.: Weak convergence rates of population versus single-chain stochastic approximation MCMC algorithms. Adv. Appl. Probab. 46, 1059–1083 (2014). https://doi.org/10.1239/aap/1418396243
Article MathSciNet MATH Google Scholar
Srivastava, S., Cevher, V., Tran-Dinh, Q., Dunson, D.B.: WASP: scalable Bayes via barycenters of subset posteriors. In: AISTATS, volume 38 of JMLR: Workshop and Conference Proceedings (JMLR.org) (2015a)
Srivastava, S., Li, C., Dunson, D.B.: Scalable Bayes via barycenter in Wasserstein space. ArXiv e-prints (2015b)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. (Ser. B) 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Wand, M.: KernSmooth: functions for kernel smoothing supporting Wand & Jones (1995). r package version 2.23-15 (2015)
Wang, X., Dunson, D.: Parallelizing MCMC via Weierstrass sampler. ArXiv e-prints (2013)
Wang, X., Guo, F., Heller, K.A., Dunson, D.B.: parallelizing MCMC with random partition trees. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 451–459. Curran Associates, Inc. http://papers.nips.cc/paper/5986-parallelizing-mcmc-with-random-partition-trees.pdf (2015)

Download references

Acknowledgements

Liang’s research was supported in part by the Grants DMS-1545202, DMS-1612924 and R01-GM117597. The authors thank the Editor, Associate Editor and two referees for their constructive comments which has led to significant improvement of this paper.

Author information

Authors and Affiliations

Department of Statistics, Texas A&M University, College Station, TX, 77843, USA
Jingnan Xue
Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
Faming Liang

Authors

Jingnan Xue
View author publications
You can also search for this author in PubMed Google Scholar
Faming Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Faming Liang.

Appendix A Proof of Theorem 1

Under conditions A1 and A2, we can expand the mean of each subposterior at the corresponding MLE, $\hat{\varvec{\theta }}^{(j)}$, as follows:

$$\begin{aligned} \varvec{\mu }^{(j)}= & {} E_{\tilde{\pi }_j} (\varvec{\theta }) =\hat{\varvec{\theta }}^{(j)}+\frac{\hat{I}^{(j)-1}}{n} \\&\times \left[ \frac{\partial \log g({\varvec{\theta }})}{\partial {\varvec{\theta }}}|_{\hat{{\varvec{\theta }}}^{(j)}} -\frac{1}{2}\hat{H}^{(j)}\hat{I}^{(j)-1} \right] +O(n^{-2}), \end{aligned}$$

where $\tilde{\pi }_j=\tilde{\pi }({\varvec{\theta }}|{\varvec{X}}_{[j]})$, $\hat{I}^{(j)}=-\frac{1}{m} \frac{\partial ^{2} \log f({\varvec{X}}_{[j]}|\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^{T}}|_{\varvec{\theta }=\hat{\varvec{\theta }}^{(j)}}$, $\hat{H}^{(j)}=-\frac{1}{m}\frac{\partial ^{3} logf({\varvec{X}}_{[j]}|\varvec{\theta })}{\partial \varvec{\theta } \partial \varvec{\theta }^{T}\partial \varvec{\theta }}|_{\varvec{\theta }=\hat{\varvec{\theta }}^{(j)}}$, and $\hat{H}^{(j)}\hat{I}^{(j)-1}$ is a vector whose rth element equals $\sum _{st}\hat{H}_{rst}^{(j)}\hat{I}_{st}^{(j)-1}$. To simplify the notation, we denote $\hat{I}^{(j)-1}[\partial \log g({\varvec{\theta }})/\partial {\varvec{\theta }}|_{\hat{{\varvec{\theta }}}^{(j)}} -\frac{1}{2}\hat{H}^{(j)}\hat{I}^{(j)-1}]$ by $\varvec{\nu }^{(j)}$. Here we would like to mention that although A1 and A2 are directly associated with the Laplace approximation of the full posterior, they can also lead to the conditions that guarantee the Laplace approximation of each subposterior, as long as m goes to $\infty $ with n.

Moreover, for each $\hat{\varvec{\theta }}^{(j)}$, we have

$$\begin{aligned} \hat{\varvec{\theta }}^{(j)}=\varvec{\theta }^{*}+\frac{\varvec{\xi }^{(j)}}{\sqrt{m}}+O_{p}(m^{-1}), \end{aligned}$$

where

$$\begin{aligned} \varvec{\xi }^{(j)}= & {} \frac{1}{\sqrt{m}}I^{-1}\sum _{i=1}^{m}\frac{\partial \log f(X_{ji}|\varvec{\theta }^{*})}{\partial \varvec{\theta }},\\ I= & {} -E_{{\varvec{X}}|\varvec{\theta }^{*}}\frac{\partial ^2 \log f({\varvec{X}}|\varvec{\theta }^{*}) }{\partial \varvec{\theta } \partial \varvec{\theta }^{T} }. \end{aligned}$$

Therefore, the mean of the mixture distribution $\tilde{\pi }(\varvec{\theta }|{\varvec{X}})$ is given by

$$\begin{aligned} E_{\tilde{\pi }} (\varvec{\theta })= & {} \frac{1}{k}\sum _{j=1}^{k}\varvec{\mu }^{(j)}\\= & {} \varvec{\theta }^{*}+\frac{1}{k}\sum _{j=1}^{k}\frac{\varvec{\nu }^{(j)}}{n}+\frac{1}{k}\sum _{j=1}^{k} \frac{\varvec{\xi }^{(j)}}{\sqrt{m}}\\&+\,O_{p}(m^{-1})+O(n^{-2}). \end{aligned}$$

We can also implement a similar procedure on the full data posterior $\pi (\varvec{\theta }|{\varvec{X}})$ and obtain

$$\begin{aligned} E_{\pi }(\varvec{\theta })= & {} \varvec{\theta }^{*}+\frac{\varvec{\nu }}{n} +\frac{\varvec{\xi }}{\sqrt{n}}+O_{p}(n^{-1})+O(n^{-2}), \end{aligned}$$

where $\varvec{\nu }=\hat{I}^{-1}[\partial \log g({\varvec{\theta }})/\partial {\varvec{\theta }}|_{\hat{{\varvec{\theta }}} } -\frac{1}{2}\hat{H}\hat{I}^{-1}]$, $\varvec{\xi }=\frac{1}{\sqrt{n}}I^{-1}\sum _{i=1}^{n}\frac{\partial \log f(X_{i}|\varvec{\theta }^{*})}{\partial \varvec{\theta }}$, $\hat{I}=-\frac{1}{n} \frac{\partial ^{2} \log f({\varvec{X}}|\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^{T}}|_{\varvec{\theta }=\hat{\varvec{\theta }}}$, $\hat{H}=-\frac{1}{m}\frac{\partial ^{3} logf({\varvec{X}}|\varvec{\theta })}{\partial \varvec{\theta } \partial \varvec{\theta }^{T}\partial \varvec{\theta }}|_{\varvec{\theta }=\hat{\varvec{\theta }}}$, and $\hat{\varvec{\theta }}$ denotes the MLE of $\theta $ calculated with the full dataset. By noting that $\frac{\varvec{\xi }}{\sqrt{n}}=\frac{1}{k}\sum _{j=1}^{k}\frac{\varvec{\xi }^{(j)}}{\sqrt{m}}$, $\varvec{\nu }^{(j)}$ and $\varvec{\nu }$ are O(1), $m<n$, Eq. (4) in Theorem 1 is thereby verified:

$$\begin{aligned} E_{\tilde{\pi }} (\varvec{\theta })-E_{\pi } (\varvec{\theta })= & {} \frac{1}{k}\sum _{j=1}^{k}\frac{\varvec{\nu }^{(j)}}{n}+O_{p}(m^{-1})\\&-\frac{\varvec{\nu }}{n} +O_{p}(n^{-1})+O(n^{-2}) \\= & {} O(n^{-1})+O_{p}(m^{-1})+O(n^{-1})\\&+\,O_{p}(n^{-1})+O(n^{-2}) \\= & {} O_{p}(m^{-1}). \end{aligned}$$

The variance of each subposterior can be approximated as follows:

$$\begin{aligned} \mathrm{Var}_{\tilde{\pi }_{j}}(\varvec{\theta })=\frac{\hat{I}^{(j)-1}}{n}+O(n^{-2}). \end{aligned}$$

Therefore, the variance of the mixture distribution $\tilde{\pi }(\varvec{\theta }|{\varvec{X}})$ is given by

$$\begin{aligned} \mathrm{Var}_{\tilde{\pi }}(\varvec{\theta })=\frac{1}{k}\sum _{j=1}^{k}\mathrm{Var}_{\tilde{\pi }_{j}}(\varvec{\theta }) =\frac{1}{k}\sum _{j=1}^{k}\frac{\hat{I}^{(j)-1}}{n}+O(n^{-2}). \end{aligned}$$

In addition, we have

$$\begin{aligned} \mathrm{Var}_{\pi }(\varvec{\theta })=\frac{\hat{I}^{-1}}{n}+O(n^{-2}). \end{aligned}$$

Since $\hat{I}^{(j)}=I+o_{p}(1)$ and $\hat{I}=I+o_{p}(1)$, we further have $\hat{I}^{(j)-1}=I^{-1}+o_{p}(1)$ and $\hat{I}^{-1}=I^{-1}+o_{p}(1)$. Equation (5) in Theorem 1 is thereby verified:

$$\begin{aligned} \mathrm{Var}_{\tilde{\pi }}(\varvec{\theta })-\mathrm{Var}_{\pi }(\varvec{\theta })= & {} \frac{1}{k}\sum _{j=1}^{k}\frac{\hat{I}^{(j)-1}}{n}-\frac{\hat{I}^{-1}}{n}+O(n^{-2})\\= & {} \frac{I^{-1}}{n}+o_{p}(n^{-1})-\frac{I^{-1}}{n}\\&+\,o_{p}(n^{-1})+O(n^{-2})\\= & {} o_{p}(n^{-1}). \end{aligned}$$

Finally, for the square of the Wasserstein distance of order 2, we have

$$\begin{aligned}&\bigg |d^{2}(\tilde{\pi }(\varvec{\theta }|{\varvec{X}}),\delta _{{\varvec{\theta }}^{*}})-d^{2}(\pi (\varvec{\theta }|{\varvec{X}}),\delta _{{\varvec{\theta }}^{*}})\bigg |\\&\quad =\bigg |\int _{\Theta }\Vert \varvec{\theta }- \varvec{\theta }^{*}\Vert _{2}^{2}\tilde{\pi }(\varvec{\theta }|{\varvec{X}})\mathrm{d}\varvec{\theta }\\&\quad \quad -\int _{\Theta }\Vert \varvec{\theta }- \varvec{\theta }^{*}\Vert _{2}^{2}\pi (\varvec{\theta }|{\varvec{X}})\mathrm{d}\varvec{\theta }\bigg |\\&\quad =\left| \left\| E_{\tilde{\pi }}(\varvec{\theta })-\varvec{\theta }^{*}\right\| _2^2+tr\left( \mathrm{Var}_{\tilde{\pi }}(\varvec{\theta }) \right) \right. \\&\quad \quad \left. -\left\| E_{\pi }(\varvec{\theta })-\varvec{\theta }^{*}\right\| _2^2 -tr\left( \mathrm{Var}_{\pi }(\varvec{\theta }) \right) \right| \\&\quad \le \left| \left\| E_{\tilde{\pi }}(\varvec{\theta })-E_{\pi }(\varvec{\theta })+E_{\pi }(\varvec{\theta })-\varvec{\theta }^{*}\right\| _2^2\right. \\&\left. \quad \quad -\left\| E_{\pi }(\varvec{\theta })-\varvec{\theta }^{*}\right\| _2^2\right| +|tr\left( \mathrm{Var}_{\tilde{\pi }}(\varvec{\theta }) -\mathrm{Var}_{\pi }(\varvec{\theta }) \right) |\\&\quad \le \Vert E_{\tilde{\pi }}(\varvec{\theta })-E_{\pi }(\varvec{\theta })\Vert _2^2+2\Vert E_{\tilde{\pi }}(\varvec{\theta })\\&\quad \quad -\,E_{\pi }(\varvec{\theta })\Vert _{2}\Vert E_{\pi }(\varvec{\theta })-\varvec{\theta }^{*}\Vert _{2}+o_{p}(n^{-1})\\&\quad =O_{p}(m^{-2})+2O_{p}(m^{-1})O_{p}(n^{-1/2})+o_{p}(n^{-1})\\&\quad =o_{p}(n^{-1}), \end{aligned}$$

where the last equality follows from the fact $n=o(m^{2})$. This verifies Eq. (5) in Theorem 1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xue, J., Liang, F. Double-Parallel Monte Carlo for Bayesian analysis of big data. Stat Comput 29, 23–32 (2019). https://doi.org/10.1007/s11222-017-9791-1

Download citation

Received: 22 April 2017
Accepted: 17 November 2017
Published: 27 November 2017
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s11222-017-9791-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Double-Parallel Monte Carlo for Bayesian analysis of big data

Abstract

Access this article

Similar content being viewed by others

Big Data Analytics: A Literature Review Paper

Uncertainty in big data analytics: survey, opportunities, and challenges

Stratified random sampling from streaming and stored data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Double-Parallel Monte Carlo for Bayesian analysis of big data

Abstract

Access this article

Similar content being viewed by others

Big Data Analytics: A Literature Review Paper

Uncertainty in big data analytics: survey, opportunities, and challenges

Stratified random sampling from streaming and stored data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A Proof of Theorem 1

Appendix A Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation