Sequences of regressions and their independences

Wermuth, Nanny; Sadeghi, Kayvan

doi:10.1007/s11749-012-0290-6

Sequences of regressions and their independences

Invited Paper
Published: 02 May 2012

Volume 21, pages 215–252, (2012)
Cite this article

TEST Aims and scope Submit manuscript

Nanny Wermuth^1,2 &
Kayvan Sadeghi³

351 Accesses
34 Citations
Explore all metrics

A Discussion to this article was published on 28 April 2012

A Discussion to this article was published on 03 April 2012

A Discussion to this article was published on 31 March 2012

Abstract

Ordered sequences of univariate or multivariate regressions provide statistical models for analysing data from randomized, possibly sequential interventions, from cohort or multi-wave panel studies, but also from cross-sectional or retrospective studies. Conditional independences are captured by what we name regression graphs, provided the generated distribution shares some properties with a joint Gaussian distribution. Regression graphs extend purely directed, acyclic graphs by two types of undirected graph, one type for components of joint responses and the other for components of the context vector variable. We review the special features and the history of regression graphs, prove criteria for Markov equivalence and discuss the notion of a simpler statistical covering model. Knowledge of Markov equivalence provides alternative interpretations of a given sequence of regressions, is essential for machine learning strategies and permits to use the simple graphical criteria of regression graphs on graphs for which the corresponding criteria are in general more complex. Under the known conditions that a Markov equivalent directed acyclic graph exists for any given regression graph, we give a polynomial time algorithm to find one such graph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graphical models, regression graphs, and recursive linear regression in a unified way

Article 01 June 2019

Identification of causal effects in linear models: beyond instrumental variables

Article 06 December 2014

Graphical Causal Models

References

Ali RA, Richardson TS, Spirtes P (2009) Markov equivalence for ancestral graphs. Ann Stat 37:2808–2837
Article MathSciNet MATH Google Scholar
Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New York (3rd edn, 2003)
MATH Google Scholar
Anderson TW (1973) Asymptotically efficient estimation of covariance matrices with linear structure. Ann Stat 1:135–141
Article MATH Google Scholar
Andersson SA, Perlman MD (2006) Characterizing Markov equivalence classes for AMP chain graph. Ann Stat 34:939–972
Article MathSciNet MATH Google Scholar
Andersson SA, Madigan D, Perlman MD, Triggs CM (1997) A graphical characterization of lattice conditional independence models. Ann Math Artif Intell 21:27–50
Article MathSciNet MATH Google Scholar
Andersson SA, Madigan D, Perlman MD (2001) Alternative Markov properties for chain graphs. Scand J Stat 28:33–86
Article MathSciNet MATH Google Scholar
Barndorff-Nielsen OE (1978) Information and exponential families in statistical theory. Wiley, Chichester
MATH Google Scholar
Bergsma W, Rudas T (2002) Marginal models for categorical data. Ann Stat 30:140–159
Article MathSciNet MATH Google Scholar
Birch MW (1963) Maximum likelihood in three-way contingency tables. J R Stat Soc B 25:220–233
MathSciNet MATH Google Scholar
Bishop YMM, Fienberg SF, Holland PW (1975) Discrete multivariate analysis. MIT Press, Cambridge
MATH Google Scholar
Blair JRS, Peyton BW (1993) An introduction to chordal graphs and clique trees. In: George JA, Gilbert JR, Liu JWH (eds) Graph theory and sparse matrix computations. IMA volumes in mathematics and its applications, vol 56. Springer, New York, pp 1–30
Chapter Google Scholar
Brito C, Pearl J (2002) A new identification condition for recursive models with correlated errors. Struct Equ Model 9:459–474
Article MathSciNet Google Scholar
Bollen KA (1989) Structural equations with latent variables. Wiley, New York
MATH Google Scholar
Brown LD (1986) Fundamentals of statistical exponential families with applications in statistical decision theory. LNMS, vol 9. Inst Math Stat, Beachwood
MATH Google Scholar
Castelo R, Kocka T (2003) On inclusion-driven learning of Bayesian networks. J Mach Learn Res 4:527–574
MathSciNet Google Scholar
Castelo R, Siebes A (2003) A characterization of moral transitive acyclic directed graph Markov models as labeled trees. J Stat Plan Inference 115:235–259
Article MathSciNet MATH Google Scholar
Caussinus H (1966) Contribution á l’analyse statistique des tableaux de corrélation. Ann Fac Sci Univ Toulouse 29:77–183
Article MathSciNet Google Scholar
Cayley A (1889) A theorem on trees. Q J Math 23:376–378
Google Scholar
Chaudhuri S, Drton M, Richardson TS (2007) Estimation of a covariance matrix with zeros. Biometrika 94:199–216
Article MathSciNet MATH Google Scholar
Cochran WG (1938) The omission or addition of an independent variate in multiple linear regression. Suppl J R Stat Soc 5:171–176
Article Google Scholar
Chickering DM (1995) A transformational characterization of equivalent Bayesian networks. In: Besnard P, Hanks S (eds) Proc 10th UAI conf. Kaufman, San Mateo, pp 87–98
Google Scholar
Cox DR (1966) Some procedures associated with the logistic qualitative response curve. In: David FN (ed) Research papers in statistics: Festschrift for J Neyman. Wiley, New York, pp 55–71
Google Scholar
Cox DR (2006) Principles of statistical inference. Cambridge University Press, Cambridge
Book MATH Google Scholar
Cox DR, Wermuth N (1990) An approximation to maximum-likelihood estimates in reduced models. Biometrika 77:747–761
Article MathSciNet MATH Google Scholar
Cox DR, Wermuth N (1993) Linear dependencies represented by chain graphs (with discussion). Stat Sci 8:204–218; 247–277
Article MathSciNet MATH Google Scholar
Cox DR, Wermuth N (1994) Tests of linearity, multivariate normality and adequacy of linear scores. J R Stat Soc C 43:347–355
MATH Google Scholar
Cox DR, Wermuth N (1996) Multivariate dependencies: models, analysis, and interpretation. Chapman and Hall/CRC Press, London
MATH Google Scholar
Cox DR, Wermuth N (1999) Likelihood factorizations for mixed discrete and continuous variables. Scand J Stat 26:209–220
Article MathSciNet MATH Google Scholar
Cox DR, Wermuth N (2003) A general condition for avoiding effect reversal after marginalization. J R Stat Soc B 65:937–941
Article MathSciNet MATH Google Scholar
Darroch JN (1962) Interactions in multi-factor contingency tables. J R Stat Soc B 24:251–263
MathSciNet MATH Google Scholar
Darroch JN, Lauritzen SL, Speed TP (1980) Markov fields and log-linear models for contingency tables. Ann Stat 8:522–539
Article MathSciNet MATH Google Scholar
Dawid AP (1979) Conditional independence in statistical theory (with discussion). J R Stat Soc B 41:1–31
MathSciNet MATH Google Scholar
Dempster AP (1969) Elements of continuous multivariate analysis. Addison-Wesley, Reading
MATH Google Scholar
Dempster AP (1972) Covariance selection. Biometrics 28:157–175
Article Google Scholar
Dinitz Y (2006) Dinitz’ algorithm: the original version and even’s version. In: Even S, Goldreich O, Rosenberg AL, Selman AL (eds) Essays in memory of Shimon Even. Springer, New York, pp 218–240
Google Scholar
Dirac GA (1961) On rigid circuit graphs. Abh Math Semin Univ Hamb 25:71–76
Article MathSciNet MATH Google Scholar
Drton M (2009) Discrete chain graph models. Bernoulli 15:736–753
Article MathSciNet MATH Google Scholar
Drton M, Perlman MD (2004) Model selection for Gaussian concentration graphs. Biometrika 91:591–602
Article MathSciNet MATH Google Scholar
Drton M, Richardson TS (2004) Multimodality of the likelihood in the bivariate seemingly unrelated regression model. Biometrika 91:383–392
Article MathSciNet MATH Google Scholar
Drton M, Richardson TS (2008a) Binary models for marginal independence. J R Stat Soc B 70:287–309
Article MathSciNet MATH Google Scholar
Drton M, Richardson TS (2008b) Graphical methods for efficient likelihood inference in Gaussian covariance models, J. J Mach Learn Res 9:893–914
MathSciNet MATH Google Scholar
Edwards D (2000) Introduction to graphical modelling, 2nd edn. Springer, New York
Book MATH Google Scholar
Foygel R, Draisma J, Drton M (2011) Half-trek criterion for generic identifiability of linear structural equation models (submitted). Available under http://arxiv.org/abs/1107.5552
Frydenberg M (1990) The chain graph Markov property. Scand J Stat 17:333–353
MathSciNet MATH Google Scholar
Geiger D, Verma TS, Pearl J (1990) Identifying independence in Bayesian networks. Networks 20:507–534
Article MathSciNet MATH Google Scholar
Glonek GFV, McCullagh P (1995) Multivariate logistic models. J R Stat Soc B 53:533–546
Google Scholar
Goodman LA (1970) The multivariate analysis of qualitative data: interaction among multiple classifications. J Am Stat Assoc 65:226–256
Google Scholar
Haavelmo T (1943) The statistical implications of a system of simultaneous equations. Econometrica 11:1–12
Article MathSciNet MATH Google Scholar
Hardt J, Sidor A, Nickel R, Kappis B, Petrak F, Egle UT (2008) Childhood adversities and suicide attempts: a retrospective study. J Fam Violence 23:713–718
Article Google Scholar
Jensen ST (1988) Covariance hypotheses which are linear in both the covariance and the inverse covariance. Ann Stat 16:302–322
Article MATH Google Scholar
Jöreskog KG (1981) Analysis of covariance structures. Scand J Stat 8:65–92
MATH Google Scholar
Kang C, Tian J (2009) Markov properties for linear causal models with correlated errors. J Mach Learn Res 10:41–70
MATH Google Scholar
Kappesser J (1997) Bedeutung der Lokalisation für die Entwicklung und Behandlung chronischer Schmerzen. Thesis, Department of Psychology, University of Mainz
Kauermann G (1996) On a dualization of graphical Gaussian models. Scand J Stat 23:115–116
MathSciNet Google Scholar
Kiiveri HT (1987) An incomplete data approach to the analysis of covariance structures. Psychometrika 52:539–554
Article MathSciNet MATH Google Scholar
Kiiveri HT, Speed TP, Carlin JB (1984) Recursive causal models. J Aust Math Soc A 36:30–52
Article MathSciNet MATH Google Scholar
Kline RB (2006) Principles and practice of structural equation modeling, 3rd edn. Guilford Press, New York
Google Scholar
Lauritzen SL (1996) Graphical models. Oxford University Press, Oxford
Google Scholar
Lauritzen SL, Wermuth N (1989) Graphical models for association between variables, some of which are qualitative and some quantitative. Ann Stat 17:31–57
Article MathSciNet MATH Google Scholar
Lehmann EL, Scheffé H (1955) Completeness, similar regions and unbiased estimation. Sankhya 14:219–236
Google Scholar
Lněnička R, Matúš F (2007) On Gaussian conditional independence structures. Kybernetika 43:323–342
Google Scholar
Lupparelli M, Marchetti GM, Bergsma WP (2009) Parameterization and fitting of discrete bi-directed graph models. Scand J Stat 36:559–576
Article MathSciNet MATH Google Scholar
Ma ZM, Xie XC, Geng Z (2006) Collapsibility of distribution dependence. J R Stat Soc B 68:127–133
Article MathSciNet MATH Google Scholar
Mandelbaum A, Rüschendorf L (1987) Complete and symmetrically complete families of distributions. Ann Stat 15:1229–1244
Article MATH Google Scholar
Marchetti GM, Lupparelli M (2011) Chain graph models of multivariate regression type for categorical data. Bernoulli 17:845–879
Article MathSciNet Google Scholar
Marchetti GM, Wermuth N (2009) Matrix representations and independencies in directed acyclic graphs. Ann Stat 47:961–978
Article MathSciNet Google Scholar
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall/CRC Press, London
MATH Google Scholar
Nelder JA, Wedderburn R (1972) Generalized linear models. J R Stat Soc, A 135:37–384
Google Scholar
Pearl J (1988) Probabilistic reasoning in intelligent systems. Kaufmann, San Mateo
Google Scholar
Pearl J (2009) Causality: models, reasoning, and inference, 2nd edn. Cambridge University Press, New York
MATH Google Scholar
Pearl J, Paz A (1987) Graphoids: a graph based logic for reasoning about relevancy revelations. In: Boulay BD, Hogg D, Steel L (eds) Advances in artificial intelligence II. North Holland, Amsterdam, pp 357–363
Google Scholar
Pearl J, Wermuth N (1994) When can association graphs admit a causal interpretation? In: Cheeseman P, Oldford W (eds) Models and data, artificial intelligence and statistics IV. Springer, New York, pp 205–214
Google Scholar
Richardson TS, Spirtes P (2002) Ancestral Markov graphical models. Ann Stat 30:962–1030
Article MathSciNet MATH Google Scholar
Roverato A (2005) A unified approach to the characterisation of Markov equivalence classes of directed acyclic graphs, chain graphs with no flags and chain graphs. Scand J Stat 32:295–312
Article MathSciNet MATH Google Scholar
Roverato A, Studený M (2006) A graphical representation of equivalence classes of AMP chain graphs. J Mach Learn Res 7:1045–1078
MathSciNet MATH Google Scholar
Rudas T, Bergsma WP, Nemeth R (2010) Marginal log-linear parameterization of conditional independence models. Biometrika 97:1006–1012
Article MathSciNet MATH Google Scholar
Sadeghi K (2009) Representing modified independence structures. Transfer thesis, Oxford University
Sadeghi K, Lauritzen SL (2012) Markov properties of mixed graphs (submitted). Also available on http://arxiv.org/abs/1109.5909
San Martin E, Mouchart M, Rolin JM (2005) Ignorable common information, null sets and Basu’s first theorem. Sankhya 67:674–698
MathSciNet Google Scholar
Speed TP, Kiiveri HT (1986) Gaussian Markov distributions over finite graphs. Ann Stat 14:138–150
Article MathSciNet MATH Google Scholar
Spirtes P, Glymour C, Scheines R (1993) Causation, prediction and search. Springer, New York
Book MATH Google Scholar
Stanghellini E, Wermuth N (2005) On the identification of path analysis models with one hidden variable. Biometrika 92:337–350
Article MathSciNet MATH Google Scholar
Studený M (2005) Probabilistic conditional independence structures. Springer, London
MATH Google Scholar
Sundberg R (2010) Flat and multimodal likelihoods and model lack of fit in curved exponential families. Scand J Stat 37:632–643
Article MathSciNet MATH Google Scholar
Tarjan RE, Yannakakis M (1984) Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J Comput 13:566–579
Article MathSciNet MATH Google Scholar
Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: Bonissone PP, Henrion M, Kanal LN, Lemmer JF (eds) Proc 6th UAI conf. Elsevier, Amsterdam, pp 220–227
Google Scholar
Wermuth N (1976a) Analogies between multiplicative models for contingency tables and covariance selection. Biometrics 32:95–108
Article MathSciNet MATH Google Scholar
Wermuth N (1976b) Model search among multiplicative models. Biometrics 32:253–263
Article MathSciNet MATH Google Scholar
Wermuth N (1980) Linear recursive equations, covariance selection, and path analysis. J Am Stat Assoc 75:963–997
MathSciNet MATH Google Scholar
Wermuth N (2011) Probability models with summary graph structure. Bernoulli 17:845–879
Article MathSciNet MATH Google Scholar
Wermuth N, Cox DR (1998) On association models defined over independence graphs. Bernoulli 4:477–495
Article MathSciNet MATH Google Scholar
Wermuth N, Cox DR (2004) Joint response graphs and separation induced by triangular systems. J R Stat Soc B 66:687–717
Article MathSciNet MATH Google Scholar
Wermuth N, Lauritzen SL (1983) Graphical and recursive models for contingency tables. Biometrika 70:537–552
Article MathSciNet MATH Google Scholar
Wermuth N, Lauritzen SL (1990) On substantive research hypotheses, conditional independence graphs and graphical chain models (with discussion). J R Stat Soc B 52:21–75
MathSciNet Google Scholar
Wermuth N, Cox DR, Marchetti GM (2006a) Covariance chains. Bernoulli 12:841–862
Article MathSciNet MATH Google Scholar
Wermuth N, Wiedenbeck M, Cox DR (2006b) Partial inversion for linear systems and partial closure of independence graphs. BIT Numer Math 46:883–901
Article MathSciNet MATH Google Scholar
Wermuth N, Marchetti GM, Cox DR (2009) Triangular systems for symmetric binary variables. Electron J Stat 3:932–955
Article MathSciNet Google Scholar
Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, Chichester
MATH Google Scholar
Wiedenbeck M, Wermuth N (2010) Changing parameters by partial mappings. Stat Sin 20:823–836
MathSciNet MATH Google Scholar
Zellner A (1962) An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J Am Stat Assoc 57:348–368
MathSciNet MATH Google Scholar
Zhao H, Zheng Z, Liu B (2005) On the Markov equivalence of maximal ancestral graphs. Sci China Ser A 48:548–562
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The work of the first author has been supported in part by the Swedish Research Society via the Gothenburg Stochastic Centre and by the Swedish Strategic Fund via the Gothenburg Mathematical Modelling Centre. We thank R. Castelo, D.R. Cox, G. Marchetti and the referees for their most helpful comments.

Author information

Authors and Affiliations

Department of Mathematics, Chalmers Technical University, Gothenburg, Sweden
Nanny Wermuth
International Agency of Research on Cancer, Lyon, France
Nanny Wermuth
Department of Statistics, University of Oxford, Oxford, UK
Kayvan Sadeghi

Authors

Nanny Wermuth
View author publications
You can also search for this author in PubMed Google Scholar
Kayvan Sadeghi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nanny Wermuth.

Additional information

Communicated by Domingo Morales.

Appendix: Details of regressions for the chronic pain data

Tables 1–8 show the results of linear least-squares regressions or logistic regressions, one at a time, for each of the response variables and for each component of a joint response separately. At first, each response is regressed on all its potentially explanatory variables given by their first ordering. The tables give the estimated constant term and for each variable in the regression, its estimated coefficient (coeff), the estimated standard deviation of the coefficient (s _coeff), as well as the ratio z _obs=coeff/s _coeff. These ratios are compared with 2.58, the 0.995 quantile of a random variable Z having a standard Gaussian distribution, for which Pr(|Z|>2.58)=0.01. In backward selection steps, the variable with the smallest observed value |z _obs| is deleted from a regression equation, one at a time, until the threshold is reached.

Table 1 Response: Y, success of treatment; linear regression including a quadratic term

Full size table

Table 2 Response: Z _a, intensity of pain after treatment; linear regression

Full size table

Table 3 Response: X _a, depression after treatment; linear regression

Full size table

Table 4 Response: Z _b, intensity of pain before; linear regression

Full size table

Table 5 Response: X _b, depression before; linear regression

Full size table

Table 6 Response: U, chronicity of pain; linear regression

Full size table

Table 7 Response: A, site of pain; logistic regression

Full size table

Table 8 Response: V, previous illnesses; linear regression

Full size table

The procedure defines a selected model, unless one of the excluded variables has a contribution of \(|z '_{\mathrm{obs}}|>2.58\) when added alone to the selected directly explanatory variables; then such a variable needs also to be included as an important directly explanatory variable. This did not happen in the given data set.

The tables show for linear models also R ², the coefficient of determination, both for the full and for the selected model. Multiplied by 100, it gives the percentage of the variation in the response explained by the model.

In the linear regression of Z _a on X _a and on the directly explanatory variables of both Z _a and X _a, that is, on Z _b,X _b,A, the contribution of X _a leads to z _obs=3.51, which coincides—by definition—with z _obs computed for the contribution of Z _a in the linear regression of X _a on Z _a and on Z _b,X _b,A. Hence the two responses are correlated even after considering the directly explanatory variables and a dashed line joining Z _a and Z _b is added to the well-fitting regression graph in Fig. 8.

In the linear regression of Z _b on X _b and on the directly explanatory variables of both Z _b and X _b, that is, on U,A,V,B, the contribution of X _b leads to z _obs=2.64. Hence the two responses are associated after considering their directly explanatory variables and there is a dashed line joining Z _b and X _b in the regression graph of Fig. 8.

The relatively strict criterion, for excluding variables, assures that all edges in the derived regression graph correspond to dependences that are considered to be substantive in the given context. Had instead a 0.975 quantile been chosen as threshold, then one arrow from A to Y and another from U to X _a would have been added to the regression graph. Although this would correspond to a better goodness-of-fit, such weak dependences are less likely to become confirmed as being important in follow-up studies.

The subgraph induced by Z _a,Z _b,X _a,X _b of the regression graph in Fig. 8 corresponds to two seemingly unrelated regressions. Even though separate least-squares estimates can in principle be severely distorted, for the present data, the structure is so well-fitting in the unconstrained multivariate regression of Z _a and X _a on Z _b, X _b, U,V,A,B, that is, in a simple covering model, that none of these potential problems are relevant.

With C={U,V,A,B}, this is evident from the observed covariance matrix of Z _a,X _a given Z _b,X _b,C, denoted here by \(\tilde{\varSigma}_{aa|bC}\) and the observed regression coefficient matrix \(\tilde{\varPi}_{a|b.C}\) being almost identical to the corresponding maximum likelihood estimators \(\hat{\varSigma}_{aa|bC}\) and \(\hat{\varPi}_{a|b.C}\).

The former can be obtained by sweeping or partially inverting the observed covariance matrix of the eight variables with respect to Z _b,X _b,C and the latter by using an adaption of the EM-algorithm, due to Kiiveri (1987), on the observed covariance matrix of the four symptoms, corrected for linear regression on C. In this way, one gets

The assumed definition of the joint distribution in terms of univariate and multivariate regressions assures that the overall fit of the model can be judged locally in two steps. First, one compares each unconstrained, full regression of a single response with regressions constrained by some independences, that is, by selecting a subset of directly explanatory variables from the list of the potentially explanatory variables. Next, one decides for each component pair of a joint response whether this pair is conditionally independent given their directly explanatory variables considered jointly. This can again be achieved by single univariate regressions, as illustrated above for the joint responses Z _a and X _a.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wermuth, N., Sadeghi, K. Sequences of regressions and their independences. TEST 21, 215–252 (2012). https://doi.org/10.1007/s11749-012-0290-6

Download citation

Received: 02 March 2011
Accepted: 09 March 2012
Published: 02 May 2012
Issue Date: June 2012
DOI: https://doi.org/10.1007/s11749-012-0290-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequences of regressions and their independences

Abstract

Access this article

Similar content being viewed by others

Graphical models, regression graphs, and recursive linear regression in a unified way

Identification of causal effects in linear models: beyond instrumental variables

Graphical Causal Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Details of regressions for the chronic pain data

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Sequences of regressions and their independences

Abstract

Access this article

Similar content being viewed by others

Graphical models, regression graphs, and recursive linear regression in a unified way

Identification of causal effects in linear models: beyond instrumental variables

Graphical Causal Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Details of regressions for the chronic pain data

Appendix: Details of regressions for the chronic pain data

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation