Abstract
Ordered sequences of univariate or multivariate regressions provide statistical models for analysing data from randomized, possibly sequential interventions, from cohort or multi-wave panel studies, but also from cross-sectional or retrospective studies. Conditional independences are captured by what we name regression graphs, provided the generated distribution shares some properties with a joint Gaussian distribution. Regression graphs extend purely directed, acyclic graphs by two types of undirected graph, one type for components of joint responses and the other for components of the context vector variable. We review the special features and the history of regression graphs, prove criteria for Markov equivalence and discuss the notion of a simpler statistical covering model. Knowledge of Markov equivalence provides alternative interpretations of a given sequence of regressions, is essential for machine learning strategies and permits to use the simple graphical criteria of regression graphs on graphs for which the corresponding criteria are in general more complex. Under the known conditions that a Markov equivalent directed acyclic graph exists for any given regression graph, we give a polynomial time algorithm to find one such graph.
Similar content being viewed by others
References
Ali RA, Richardson TS, Spirtes P (2009) Markov equivalence for ancestral graphs. Ann Stat 37:2808–2837
Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New York (3rd edn, 2003)
Anderson TW (1973) Asymptotically efficient estimation of covariance matrices with linear structure. Ann Stat 1:135–141
Andersson SA, Perlman MD (2006) Characterizing Markov equivalence classes for AMP chain graph. Ann Stat 34:939–972
Andersson SA, Madigan D, Perlman MD, Triggs CM (1997) A graphical characterization of lattice conditional independence models. Ann Math Artif Intell 21:27–50
Andersson SA, Madigan D, Perlman MD (2001) Alternative Markov properties for chain graphs. Scand J Stat 28:33–86
Barndorff-Nielsen OE (1978) Information and exponential families in statistical theory. Wiley, Chichester
Bergsma W, Rudas T (2002) Marginal models for categorical data. Ann Stat 30:140–159
Birch MW (1963) Maximum likelihood in three-way contingency tables. J R Stat Soc B 25:220–233
Bishop YMM, Fienberg SF, Holland PW (1975) Discrete multivariate analysis. MIT Press, Cambridge
Blair JRS, Peyton BW (1993) An introduction to chordal graphs and clique trees. In: George JA, Gilbert JR, Liu JWH (eds) Graph theory and sparse matrix computations. IMA volumes in mathematics and its applications, vol 56. Springer, New York, pp 1–30
Brito C, Pearl J (2002) A new identification condition for recursive models with correlated errors. Struct Equ Model 9:459–474
Bollen KA (1989) Structural equations with latent variables. Wiley, New York
Brown LD (1986) Fundamentals of statistical exponential families with applications in statistical decision theory. LNMS, vol 9. Inst Math Stat, Beachwood
Castelo R, Kocka T (2003) On inclusion-driven learning of Bayesian networks. J Mach Learn Res 4:527–574
Castelo R, Siebes A (2003) A characterization of moral transitive acyclic directed graph Markov models as labeled trees. J Stat Plan Inference 115:235–259
Caussinus H (1966) Contribution á l’analyse statistique des tableaux de corrélation. Ann Fac Sci Univ Toulouse 29:77–183
Cayley A (1889) A theorem on trees. Q J Math 23:376–378
Chaudhuri S, Drton M, Richardson TS (2007) Estimation of a covariance matrix with zeros. Biometrika 94:199–216
Cochran WG (1938) The omission or addition of an independent variate in multiple linear regression. Suppl J R Stat Soc 5:171–176
Chickering DM (1995) A transformational characterization of equivalent Bayesian networks. In: Besnard P, Hanks S (eds) Proc 10th UAI conf. Kaufman, San Mateo, pp 87–98
Cox DR (1966) Some procedures associated with the logistic qualitative response curve. In: David FN (ed) Research papers in statistics: Festschrift for J Neyman. Wiley, New York, pp 55–71
Cox DR (2006) Principles of statistical inference. Cambridge University Press, Cambridge
Cox DR, Wermuth N (1990) An approximation to maximum-likelihood estimates in reduced models. Biometrika 77:747–761
Cox DR, Wermuth N (1993) Linear dependencies represented by chain graphs (with discussion). Stat Sci 8:204–218; 247–277
Cox DR, Wermuth N (1994) Tests of linearity, multivariate normality and adequacy of linear scores. J R Stat Soc C 43:347–355
Cox DR, Wermuth N (1996) Multivariate dependencies: models, analysis, and interpretation. Chapman and Hall/CRC Press, London
Cox DR, Wermuth N (1999) Likelihood factorizations for mixed discrete and continuous variables. Scand J Stat 26:209–220
Cox DR, Wermuth N (2003) A general condition for avoiding effect reversal after marginalization. J R Stat Soc B 65:937–941
Darroch JN (1962) Interactions in multi-factor contingency tables. J R Stat Soc B 24:251–263
Darroch JN, Lauritzen SL, Speed TP (1980) Markov fields and log-linear models for contingency tables. Ann Stat 8:522–539
Dawid AP (1979) Conditional independence in statistical theory (with discussion). J R Stat Soc B 41:1–31
Dempster AP (1969) Elements of continuous multivariate analysis. Addison-Wesley, Reading
Dempster AP (1972) Covariance selection. Biometrics 28:157–175
Dinitz Y (2006) Dinitz’ algorithm: the original version and even’s version. In: Even S, Goldreich O, Rosenberg AL, Selman AL (eds) Essays in memory of Shimon Even. Springer, New York, pp 218–240
Dirac GA (1961) On rigid circuit graphs. Abh Math Semin Univ Hamb 25:71–76
Drton M (2009) Discrete chain graph models. Bernoulli 15:736–753
Drton M, Perlman MD (2004) Model selection for Gaussian concentration graphs. Biometrika 91:591–602
Drton M, Richardson TS (2004) Multimodality of the likelihood in the bivariate seemingly unrelated regression model. Biometrika 91:383–392
Drton M, Richardson TS (2008a) Binary models for marginal independence. J R Stat Soc B 70:287–309
Drton M, Richardson TS (2008b) Graphical methods for efficient likelihood inference in Gaussian covariance models, J. J Mach Learn Res 9:893–914
Edwards D (2000) Introduction to graphical modelling, 2nd edn. Springer, New York
Foygel R, Draisma J, Drton M (2011) Half-trek criterion for generic identifiability of linear structural equation models (submitted). Available under http://arxiv.org/abs/1107.5552
Frydenberg M (1990) The chain graph Markov property. Scand J Stat 17:333–353
Geiger D, Verma TS, Pearl J (1990) Identifying independence in Bayesian networks. Networks 20:507–534
Glonek GFV, McCullagh P (1995) Multivariate logistic models. J R Stat Soc B 53:533–546
Goodman LA (1970) The multivariate analysis of qualitative data: interaction among multiple classifications. J Am Stat Assoc 65:226–256
Haavelmo T (1943) The statistical implications of a system of simultaneous equations. Econometrica 11:1–12
Hardt J, Sidor A, Nickel R, Kappis B, Petrak F, Egle UT (2008) Childhood adversities and suicide attempts: a retrospective study. J Fam Violence 23:713–718
Jensen ST (1988) Covariance hypotheses which are linear in both the covariance and the inverse covariance. Ann Stat 16:302–322
Jöreskog KG (1981) Analysis of covariance structures. Scand J Stat 8:65–92
Kang C, Tian J (2009) Markov properties for linear causal models with correlated errors. J Mach Learn Res 10:41–70
Kappesser J (1997) Bedeutung der Lokalisation für die Entwicklung und Behandlung chronischer Schmerzen. Thesis, Department of Psychology, University of Mainz
Kauermann G (1996) On a dualization of graphical Gaussian models. Scand J Stat 23:115–116
Kiiveri HT (1987) An incomplete data approach to the analysis of covariance structures. Psychometrika 52:539–554
Kiiveri HT, Speed TP, Carlin JB (1984) Recursive causal models. J Aust Math Soc A 36:30–52
Kline RB (2006) Principles and practice of structural equation modeling, 3rd edn. Guilford Press, New York
Lauritzen SL (1996) Graphical models. Oxford University Press, Oxford
Lauritzen SL, Wermuth N (1989) Graphical models for association between variables, some of which are qualitative and some quantitative. Ann Stat 17:31–57
Lehmann EL, Scheffé H (1955) Completeness, similar regions and unbiased estimation. Sankhya 14:219–236
Lněnička R, Matúš F (2007) On Gaussian conditional independence structures. Kybernetika 43:323–342
Lupparelli M, Marchetti GM, Bergsma WP (2009) Parameterization and fitting of discrete bi-directed graph models. Scand J Stat 36:559–576
Ma ZM, Xie XC, Geng Z (2006) Collapsibility of distribution dependence. J R Stat Soc B 68:127–133
Mandelbaum A, Rüschendorf L (1987) Complete and symmetrically complete families of distributions. Ann Stat 15:1229–1244
Marchetti GM, Lupparelli M (2011) Chain graph models of multivariate regression type for categorical data. Bernoulli 17:845–879
Marchetti GM, Wermuth N (2009) Matrix representations and independencies in directed acyclic graphs. Ann Stat 47:961–978
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall/CRC Press, London
Nelder JA, Wedderburn R (1972) Generalized linear models. J R Stat Soc, A 135:37–384
Pearl J (1988) Probabilistic reasoning in intelligent systems. Kaufmann, San Mateo
Pearl J (2009) Causality: models, reasoning, and inference, 2nd edn. Cambridge University Press, New York
Pearl J, Paz A (1987) Graphoids: a graph based logic for reasoning about relevancy revelations. In: Boulay BD, Hogg D, Steel L (eds) Advances in artificial intelligence II. North Holland, Amsterdam, pp 357–363
Pearl J, Wermuth N (1994) When can association graphs admit a causal interpretation? In: Cheeseman P, Oldford W (eds) Models and data, artificial intelligence and statistics IV. Springer, New York, pp 205–214
Richardson TS, Spirtes P (2002) Ancestral Markov graphical models. Ann Stat 30:962–1030
Roverato A (2005) A unified approach to the characterisation of Markov equivalence classes of directed acyclic graphs, chain graphs with no flags and chain graphs. Scand J Stat 32:295–312
Roverato A, Studený M (2006) A graphical representation of equivalence classes of AMP chain graphs. J Mach Learn Res 7:1045–1078
Rudas T, Bergsma WP, Nemeth R (2010) Marginal log-linear parameterization of conditional independence models. Biometrika 97:1006–1012
Sadeghi K (2009) Representing modified independence structures. Transfer thesis, Oxford University
Sadeghi K, Lauritzen SL (2012) Markov properties of mixed graphs (submitted). Also available on http://arxiv.org/abs/1109.5909
San Martin E, Mouchart M, Rolin JM (2005) Ignorable common information, null sets and Basu’s first theorem. Sankhya 67:674–698
Speed TP, Kiiveri HT (1986) Gaussian Markov distributions over finite graphs. Ann Stat 14:138–150
Spirtes P, Glymour C, Scheines R (1993) Causation, prediction and search. Springer, New York
Stanghellini E, Wermuth N (2005) On the identification of path analysis models with one hidden variable. Biometrika 92:337–350
Studený M (2005) Probabilistic conditional independence structures. Springer, London
Sundberg R (2010) Flat and multimodal likelihoods and model lack of fit in curved exponential families. Scand J Stat 37:632–643
Tarjan RE, Yannakakis M (1984) Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J Comput 13:566–579
Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: Bonissone PP, Henrion M, Kanal LN, Lemmer JF (eds) Proc 6th UAI conf. Elsevier, Amsterdam, pp 220–227
Wermuth N (1976a) Analogies between multiplicative models for contingency tables and covariance selection. Biometrics 32:95–108
Wermuth N (1976b) Model search among multiplicative models. Biometrics 32:253–263
Wermuth N (1980) Linear recursive equations, covariance selection, and path analysis. J Am Stat Assoc 75:963–997
Wermuth N (2011) Probability models with summary graph structure. Bernoulli 17:845–879
Wermuth N, Cox DR (1998) On association models defined over independence graphs. Bernoulli 4:477–495
Wermuth N, Cox DR (2004) Joint response graphs and separation induced by triangular systems. J R Stat Soc B 66:687–717
Wermuth N, Lauritzen SL (1983) Graphical and recursive models for contingency tables. Biometrika 70:537–552
Wermuth N, Lauritzen SL (1990) On substantive research hypotheses, conditional independence graphs and graphical chain models (with discussion). J R Stat Soc B 52:21–75
Wermuth N, Cox DR, Marchetti GM (2006a) Covariance chains. Bernoulli 12:841–862
Wermuth N, Wiedenbeck M, Cox DR (2006b) Partial inversion for linear systems and partial closure of independence graphs. BIT Numer Math 46:883–901
Wermuth N, Marchetti GM, Cox DR (2009) Triangular systems for symmetric binary variables. Electron J Stat 3:932–955
Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, Chichester
Wiedenbeck M, Wermuth N (2010) Changing parameters by partial mappings. Stat Sin 20:823–836
Zellner A (1962) An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J Am Stat Assoc 57:348–368
Zhao H, Zheng Z, Liu B (2005) On the Markov equivalence of maximal ancestral graphs. Sci China Ser A 48:548–562
Acknowledgements
The work of the first author has been supported in part by the Swedish Research Society via the Gothenburg Stochastic Centre and by the Swedish Strategic Fund via the Gothenburg Mathematical Modelling Centre. We thank R. Castelo, D.R. Cox, G. Marchetti and the referees for their most helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Domingo Morales.
Appendix: Details of regressions for the chronic pain data
Appendix: Details of regressions for the chronic pain data
Tables 1–8 show the results of linear least-squares regressions or logistic regressions, one at a time, for each of the response variables and for each component of a joint response separately. At first, each response is regressed on all its potentially explanatory variables given by their first ordering. The tables give the estimated constant term and for each variable in the regression, its estimated coefficient (coeff), the estimated standard deviation of the coefficient (s coeff), as well as the ratio z obs=coeff/s coeff. These ratios are compared with 2.58, the 0.995 quantile of a random variable Z having a standard Gaussian distribution, for which Pr(|Z|>2.58)=0.01. In backward selection steps, the variable with the smallest observed value |z obs| is deleted from a regression equation, one at a time, until the threshold is reached.
The procedure defines a selected model, unless one of the excluded variables has a contribution of \(|z '_{\mathrm{obs}}|>2.58\) when added alone to the selected directly explanatory variables; then such a variable needs also to be included as an important directly explanatory variable. This did not happen in the given data set.
The tables show for linear models also R 2, the coefficient of determination, both for the full and for the selected model. Multiplied by 100, it gives the percentage of the variation in the response explained by the model.
In the linear regression of Z a on X a and on the directly explanatory variables of both Z a and X a , that is, on Z b ,X b ,A, the contribution of X a leads to z obs=3.51, which coincides—by definition—with z obs computed for the contribution of Z a in the linear regression of X a on Z a and on Z b ,X b ,A. Hence the two responses are correlated even after considering the directly explanatory variables and a dashed line joining Z a and Z b is added to the well-fitting regression graph in Fig. 8.
In the linear regression of Z b on X b and on the directly explanatory variables of both Z b and X b , that is, on U,A,V,B, the contribution of X b leads to z obs=2.64. Hence the two responses are associated after considering their directly explanatory variables and there is a dashed line joining Z b and X b in the regression graph of Fig. 8.
The relatively strict criterion, for excluding variables, assures that all edges in the derived regression graph correspond to dependences that are considered to be substantive in the given context. Had instead a 0.975 quantile been chosen as threshold, then one arrow from A to Y and another from U to X a would have been added to the regression graph. Although this would correspond to a better goodness-of-fit, such weak dependences are less likely to become confirmed as being important in follow-up studies.
The subgraph induced by Z a ,Z b ,X a ,X b of the regression graph in Fig. 8 corresponds to two seemingly unrelated regressions. Even though separate least-squares estimates can in principle be severely distorted, for the present data, the structure is so well-fitting in the unconstrained multivariate regression of Z a and X a on Z b , X b , U,V,A,B, that is, in a simple covering model, that none of these potential problems are relevant.
With C={U,V,A,B}, this is evident from the observed covariance matrix of Z a ,X a given Z b ,X b ,C, denoted here by \(\tilde{\varSigma}_{aa|bC}\) and the observed regression coefficient matrix \(\tilde{\varPi}_{a|b.C}\) being almost identical to the corresponding maximum likelihood estimators \(\hat{\varSigma}_{aa|bC}\) and \(\hat{\varPi}_{a|b.C}\).
The former can be obtained by sweeping or partially inverting the observed covariance matrix of the eight variables with respect to Z b ,X b ,C and the latter by using an adaption of the EM-algorithm, due to Kiiveri (1987), on the observed covariance matrix of the four symptoms, corrected for linear regression on C. In this way, one gets
The assumed definition of the joint distribution in terms of univariate and multivariate regressions assures that the overall fit of the model can be judged locally in two steps. First, one compares each unconstrained, full regression of a single response with regressions constrained by some independences, that is, by selecting a subset of directly explanatory variables from the list of the potentially explanatory variables. Next, one decides for each component pair of a joint response whether this pair is conditionally independent given their directly explanatory variables considered jointly. This can again be achieved by single univariate regressions, as illustrated above for the joint responses Z a and X a .
Rights and permissions
About this article
Cite this article
Wermuth, N., Sadeghi, K. Sequences of regressions and their independences. TEST 21, 215–252 (2012). https://doi.org/10.1007/s11749-012-0290-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-012-0290-6
Keywords
- Chain graphs
- Concentration graphs
- Covariance graphs
- Graphical Markov models
- Independence graphs
- Intervention models
- Labelled trees
- Lattice conditional independence models
- Structural equation models