Skip to main content
Log in

Sequences of regressions and their independences

  • Invited Paper
  • Published:
TEST Aims and scope Submit manuscript

A Discussion to this article was published on 28 April 2012

A Discussion to this article was published on 28 April 2012

A Discussion to this article was published on 28 April 2012

A Discussion to this article was published on 03 April 2012

A Discussion to this article was published on 31 March 2012

Abstract

Ordered sequences of univariate or multivariate regressions provide statistical models for analysing data from randomized, possibly sequential interventions, from cohort or multi-wave panel studies, but also from cross-sectional or retrospective studies. Conditional independences are captured by what we name regression graphs, provided the generated distribution shares some properties with a joint Gaussian distribution. Regression graphs extend purely directed, acyclic graphs by two types of undirected graph, one type for components of joint responses and the other for components of the context vector variable. We review the special features and the history of regression graphs, prove criteria for Markov equivalence and discuss the notion of a simpler statistical covering model. Knowledge of Markov equivalence provides alternative interpretations of a given sequence of regressions, is essential for machine learning strategies and permits to use the simple graphical criteria of regression graphs on graphs for which the corresponding criteria are in general more complex. Under the known conditions that a Markov equivalent directed acyclic graph exists for any given regression graph, we give a polynomial time algorithm to find one such graph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Ali RA, Richardson TS, Spirtes P (2009) Markov equivalence for ancestral graphs. Ann Stat 37:2808–2837

    Article  MathSciNet  MATH  Google Scholar 

  • Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New York (3rd edn, 2003)

    MATH  Google Scholar 

  • Anderson TW (1973) Asymptotically efficient estimation of covariance matrices with linear structure. Ann Stat 1:135–141

    Article  MATH  Google Scholar 

  • Andersson SA, Perlman MD (2006) Characterizing Markov equivalence classes for AMP chain graph. Ann Stat 34:939–972

    Article  MathSciNet  MATH  Google Scholar 

  • Andersson SA, Madigan D, Perlman MD, Triggs CM (1997) A graphical characterization of lattice conditional independence models. Ann Math Artif Intell 21:27–50

    Article  MathSciNet  MATH  Google Scholar 

  • Andersson SA, Madigan D, Perlman MD (2001) Alternative Markov properties for chain graphs. Scand J Stat 28:33–86

    Article  MathSciNet  MATH  Google Scholar 

  • Barndorff-Nielsen OE (1978) Information and exponential families in statistical theory. Wiley, Chichester

    MATH  Google Scholar 

  • Bergsma W, Rudas T (2002) Marginal models for categorical data. Ann Stat 30:140–159

    Article  MathSciNet  MATH  Google Scholar 

  • Birch MW (1963) Maximum likelihood in three-way contingency tables. J R Stat Soc B 25:220–233

    MathSciNet  MATH  Google Scholar 

  • Bishop YMM, Fienberg SF, Holland PW (1975) Discrete multivariate analysis. MIT Press, Cambridge

    MATH  Google Scholar 

  • Blair JRS, Peyton BW (1993) An introduction to chordal graphs and clique trees. In: George JA, Gilbert JR, Liu JWH (eds) Graph theory and sparse matrix computations. IMA volumes in mathematics and its applications, vol 56. Springer, New York, pp 1–30

    Chapter  Google Scholar 

  • Brito C, Pearl J (2002) A new identification condition for recursive models with correlated errors. Struct Equ Model 9:459–474

    Article  MathSciNet  Google Scholar 

  • Bollen KA (1989) Structural equations with latent variables. Wiley, New York

    MATH  Google Scholar 

  • Brown LD (1986) Fundamentals of statistical exponential families with applications in statistical decision theory. LNMS, vol 9. Inst Math Stat, Beachwood

    MATH  Google Scholar 

  • Castelo R, Kocka T (2003) On inclusion-driven learning of Bayesian networks. J Mach Learn Res 4:527–574

    MathSciNet  Google Scholar 

  • Castelo R, Siebes A (2003) A characterization of moral transitive acyclic directed graph Markov models as labeled trees. J Stat Plan Inference 115:235–259

    Article  MathSciNet  MATH  Google Scholar 

  • Caussinus H (1966) Contribution á l’analyse statistique des tableaux de corrélation. Ann Fac Sci Univ Toulouse 29:77–183

    Article  MathSciNet  Google Scholar 

  • Cayley A (1889) A theorem on trees. Q J Math 23:376–378

    Google Scholar 

  • Chaudhuri S, Drton M, Richardson TS (2007) Estimation of a covariance matrix with zeros. Biometrika 94:199–216

    Article  MathSciNet  MATH  Google Scholar 

  • Cochran WG (1938) The omission or addition of an independent variate in multiple linear regression. Suppl J R Stat Soc 5:171–176

    Article  Google Scholar 

  • Chickering DM (1995) A transformational characterization of equivalent Bayesian networks. In: Besnard P, Hanks S (eds) Proc 10th UAI conf. Kaufman, San Mateo, pp 87–98

    Google Scholar 

  • Cox DR (1966) Some procedures associated with the logistic qualitative response curve. In: David FN (ed) Research papers in statistics: Festschrift for J Neyman. Wiley, New York, pp 55–71

    Google Scholar 

  • Cox DR (2006) Principles of statistical inference. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Cox DR, Wermuth N (1990) An approximation to maximum-likelihood estimates in reduced models. Biometrika 77:747–761

    Article  MathSciNet  MATH  Google Scholar 

  • Cox DR, Wermuth N (1993) Linear dependencies represented by chain graphs (with discussion). Stat Sci 8:204–218; 247–277

    Article  MathSciNet  MATH  Google Scholar 

  • Cox DR, Wermuth N (1994) Tests of linearity, multivariate normality and adequacy of linear scores. J R Stat Soc C 43:347–355

    MATH  Google Scholar 

  • Cox DR, Wermuth N (1996) Multivariate dependencies: models, analysis, and interpretation. Chapman and Hall/CRC Press, London

    MATH  Google Scholar 

  • Cox DR, Wermuth N (1999) Likelihood factorizations for mixed discrete and continuous variables. Scand J Stat 26:209–220

    Article  MathSciNet  MATH  Google Scholar 

  • Cox DR, Wermuth N (2003) A general condition for avoiding effect reversal after marginalization. J R Stat Soc B 65:937–941

    Article  MathSciNet  MATH  Google Scholar 

  • Darroch JN (1962) Interactions in multi-factor contingency tables. J R Stat Soc B 24:251–263

    MathSciNet  MATH  Google Scholar 

  • Darroch JN, Lauritzen SL, Speed TP (1980) Markov fields and log-linear models for contingency tables. Ann Stat 8:522–539

    Article  MathSciNet  MATH  Google Scholar 

  • Dawid AP (1979) Conditional independence in statistical theory (with discussion). J R Stat Soc B 41:1–31

    MathSciNet  MATH  Google Scholar 

  • Dempster AP (1969) Elements of continuous multivariate analysis. Addison-Wesley, Reading

    MATH  Google Scholar 

  • Dempster AP (1972) Covariance selection. Biometrics 28:157–175

    Article  Google Scholar 

  • Dinitz Y (2006) Dinitz’ algorithm: the original version and even’s version. In: Even S, Goldreich O, Rosenberg AL, Selman AL (eds) Essays in memory of Shimon Even. Springer, New York, pp 218–240

    Google Scholar 

  • Dirac GA (1961) On rigid circuit graphs. Abh Math Semin Univ Hamb 25:71–76

    Article  MathSciNet  MATH  Google Scholar 

  • Drton M (2009) Discrete chain graph models. Bernoulli 15:736–753

    Article  MathSciNet  MATH  Google Scholar 

  • Drton M, Perlman MD (2004) Model selection for Gaussian concentration graphs. Biometrika 91:591–602

    Article  MathSciNet  MATH  Google Scholar 

  • Drton M, Richardson TS (2004) Multimodality of the likelihood in the bivariate seemingly unrelated regression model. Biometrika 91:383–392

    Article  MathSciNet  MATH  Google Scholar 

  • Drton M, Richardson TS (2008a) Binary models for marginal independence. J R Stat Soc B 70:287–309

    Article  MathSciNet  MATH  Google Scholar 

  • Drton M, Richardson TS (2008b) Graphical methods for efficient likelihood inference in Gaussian covariance models, J. J Mach Learn Res 9:893–914

    MathSciNet  MATH  Google Scholar 

  • Edwards D (2000) Introduction to graphical modelling, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Foygel R, Draisma J, Drton M (2011) Half-trek criterion for generic identifiability of linear structural equation models (submitted). Available under http://arxiv.org/abs/1107.5552

  • Frydenberg M (1990) The chain graph Markov property. Scand J Stat 17:333–353

    MathSciNet  MATH  Google Scholar 

  • Geiger D, Verma TS, Pearl J (1990) Identifying independence in Bayesian networks. Networks 20:507–534

    Article  MathSciNet  MATH  Google Scholar 

  • Glonek GFV, McCullagh P (1995) Multivariate logistic models. J R Stat Soc B 53:533–546

    Google Scholar 

  • Goodman LA (1970) The multivariate analysis of qualitative data: interaction among multiple classifications. J Am Stat Assoc 65:226–256

    Google Scholar 

  • Haavelmo T (1943) The statistical implications of a system of simultaneous equations. Econometrica 11:1–12

    Article  MathSciNet  MATH  Google Scholar 

  • Hardt J, Sidor A, Nickel R, Kappis B, Petrak F, Egle UT (2008) Childhood adversities and suicide attempts: a retrospective study. J Fam Violence 23:713–718

    Article  Google Scholar 

  • Jensen ST (1988) Covariance hypotheses which are linear in both the covariance and the inverse covariance. Ann Stat 16:302–322

    Article  MATH  Google Scholar 

  • Jöreskog KG (1981) Analysis of covariance structures. Scand J Stat 8:65–92

    MATH  Google Scholar 

  • Kang C, Tian J (2009) Markov properties for linear causal models with correlated errors. J Mach Learn Res 10:41–70

    MATH  Google Scholar 

  • Kappesser J (1997) Bedeutung der Lokalisation für die Entwicklung und Behandlung chronischer Schmerzen. Thesis, Department of Psychology, University of Mainz

  • Kauermann G (1996) On a dualization of graphical Gaussian models. Scand J Stat 23:115–116

    MathSciNet  Google Scholar 

  • Kiiveri HT (1987) An incomplete data approach to the analysis of covariance structures. Psychometrika 52:539–554

    Article  MathSciNet  MATH  Google Scholar 

  • Kiiveri HT, Speed TP, Carlin JB (1984) Recursive causal models. J Aust Math Soc A 36:30–52

    Article  MathSciNet  MATH  Google Scholar 

  • Kline RB (2006) Principles and practice of structural equation modeling, 3rd edn. Guilford Press, New York

    Google Scholar 

  • Lauritzen SL (1996) Graphical models. Oxford University Press, Oxford

    Google Scholar 

  • Lauritzen SL, Wermuth N (1989) Graphical models for association between variables, some of which are qualitative and some quantitative. Ann Stat 17:31–57

    Article  MathSciNet  MATH  Google Scholar 

  • Lehmann EL, Scheffé H (1955) Completeness, similar regions and unbiased estimation. Sankhya 14:219–236

    Google Scholar 

  • Lněnička R, Matúš F (2007) On Gaussian conditional independence structures. Kybernetika 43:323–342

    Google Scholar 

  • Lupparelli M, Marchetti GM, Bergsma WP (2009) Parameterization and fitting of discrete bi-directed graph models. Scand J Stat 36:559–576

    Article  MathSciNet  MATH  Google Scholar 

  • Ma ZM, Xie XC, Geng Z (2006) Collapsibility of distribution dependence. J R Stat Soc B 68:127–133

    Article  MathSciNet  MATH  Google Scholar 

  • Mandelbaum A, Rüschendorf L (1987) Complete and symmetrically complete families of distributions. Ann Stat 15:1229–1244

    Article  MATH  Google Scholar 

  • Marchetti GM, Lupparelli M (2011) Chain graph models of multivariate regression type for categorical data. Bernoulli 17:845–879

    Article  MathSciNet  Google Scholar 

  • Marchetti GM, Wermuth N (2009) Matrix representations and independencies in directed acyclic graphs. Ann Stat 47:961–978

    Article  MathSciNet  Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall/CRC Press, London

    MATH  Google Scholar 

  • Nelder JA, Wedderburn R (1972) Generalized linear models. J R Stat Soc, A 135:37–384

    Google Scholar 

  • Pearl J (1988) Probabilistic reasoning in intelligent systems. Kaufmann, San Mateo

    Google Scholar 

  • Pearl J (2009) Causality: models, reasoning, and inference, 2nd edn. Cambridge University Press, New York

    MATH  Google Scholar 

  • Pearl J, Paz A (1987) Graphoids: a graph based logic for reasoning about relevancy revelations. In: Boulay BD, Hogg D, Steel L (eds) Advances in artificial intelligence II. North Holland, Amsterdam, pp 357–363

    Google Scholar 

  • Pearl J, Wermuth N (1994) When can association graphs admit a causal interpretation? In: Cheeseman P, Oldford W (eds) Models and data, artificial intelligence and statistics IV. Springer, New York, pp 205–214

    Google Scholar 

  • Richardson TS, Spirtes P (2002) Ancestral Markov graphical models. Ann Stat 30:962–1030

    Article  MathSciNet  MATH  Google Scholar 

  • Roverato A (2005) A unified approach to the characterisation of Markov equivalence classes of directed acyclic graphs, chain graphs with no flags and chain graphs. Scand J Stat 32:295–312

    Article  MathSciNet  MATH  Google Scholar 

  • Roverato A, Studený M (2006) A graphical representation of equivalence classes of AMP chain graphs. J Mach Learn Res 7:1045–1078

    MathSciNet  MATH  Google Scholar 

  • Rudas T, Bergsma WP, Nemeth R (2010) Marginal log-linear parameterization of conditional independence models. Biometrika 97:1006–1012

    Article  MathSciNet  MATH  Google Scholar 

  • Sadeghi K (2009) Representing modified independence structures. Transfer thesis, Oxford University

  • Sadeghi K, Lauritzen SL (2012) Markov properties of mixed graphs (submitted). Also available on http://arxiv.org/abs/1109.5909

  • San Martin E, Mouchart M, Rolin JM (2005) Ignorable common information, null sets and Basu’s first theorem. Sankhya 67:674–698

    MathSciNet  Google Scholar 

  • Speed TP, Kiiveri HT (1986) Gaussian Markov distributions over finite graphs. Ann Stat 14:138–150

    Article  MathSciNet  MATH  Google Scholar 

  • Spirtes P, Glymour C, Scheines R (1993) Causation, prediction and search. Springer, New York

    Book  MATH  Google Scholar 

  • Stanghellini E, Wermuth N (2005) On the identification of path analysis models with one hidden variable. Biometrika 92:337–350

    Article  MathSciNet  MATH  Google Scholar 

  • Studený M (2005) Probabilistic conditional independence structures. Springer, London

    MATH  Google Scholar 

  • Sundberg R (2010) Flat and multimodal likelihoods and model lack of fit in curved exponential families. Scand J Stat 37:632–643

    Article  MathSciNet  MATH  Google Scholar 

  • Tarjan RE, Yannakakis M (1984) Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J Comput 13:566–579

    Article  MathSciNet  MATH  Google Scholar 

  • Verma T, Pearl J (1990) Equivalence and synthesis of causal models. In: Bonissone PP, Henrion M, Kanal LN, Lemmer JF (eds) Proc 6th UAI conf. Elsevier, Amsterdam, pp 220–227

    Google Scholar 

  • Wermuth N (1976a) Analogies between multiplicative models for contingency tables and covariance selection. Biometrics 32:95–108

    Article  MathSciNet  MATH  Google Scholar 

  • Wermuth N (1976b) Model search among multiplicative models. Biometrics 32:253–263

    Article  MathSciNet  MATH  Google Scholar 

  • Wermuth N (1980) Linear recursive equations, covariance selection, and path analysis. J Am Stat Assoc 75:963–997

    MathSciNet  MATH  Google Scholar 

  • Wermuth N (2011) Probability models with summary graph structure. Bernoulli 17:845–879

    Article  MathSciNet  MATH  Google Scholar 

  • Wermuth N, Cox DR (1998) On association models defined over independence graphs. Bernoulli 4:477–495

    Article  MathSciNet  MATH  Google Scholar 

  • Wermuth N, Cox DR (2004) Joint response graphs and separation induced by triangular systems. J R Stat Soc B 66:687–717

    Article  MathSciNet  MATH  Google Scholar 

  • Wermuth N, Lauritzen SL (1983) Graphical and recursive models for contingency tables. Biometrika 70:537–552

    Article  MathSciNet  MATH  Google Scholar 

  • Wermuth N, Lauritzen SL (1990) On substantive research hypotheses, conditional independence graphs and graphical chain models (with discussion). J R Stat Soc B 52:21–75

    MathSciNet  Google Scholar 

  • Wermuth N, Cox DR, Marchetti GM (2006a) Covariance chains. Bernoulli 12:841–862

    Article  MathSciNet  MATH  Google Scholar 

  • Wermuth N, Wiedenbeck M, Cox DR (2006b) Partial inversion for linear systems and partial closure of independence graphs. BIT Numer Math 46:883–901

    Article  MathSciNet  MATH  Google Scholar 

  • Wermuth N, Marchetti GM, Cox DR (2009) Triangular systems for symmetric binary variables. Electron J Stat 3:932–955

    Article  MathSciNet  Google Scholar 

  • Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, Chichester

    MATH  Google Scholar 

  • Wiedenbeck M, Wermuth N (2010) Changing parameters by partial mappings. Stat Sin 20:823–836

    MathSciNet  MATH  Google Scholar 

  • Zellner A (1962) An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J Am Stat Assoc 57:348–368

    MathSciNet  MATH  Google Scholar 

  • Zhao H, Zheng Z, Liu B (2005) On the Markov equivalence of maximal ancestral graphs. Sci China Ser A 48:548–562

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The work of the first author has been supported in part by the Swedish Research Society via the Gothenburg Stochastic Centre and by the Swedish Strategic Fund via the Gothenburg Mathematical Modelling Centre. We thank R. Castelo, D.R. Cox, G. Marchetti and the referees for their most helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nanny Wermuth.

Additional information

Communicated by Domingo Morales.

Appendix: Details of regressions for the chronic pain data

Appendix: Details of regressions for the chronic pain data

Tables 18 show the results of linear least-squares regressions or logistic regressions, one at a time, for each of the response variables and for each component of a joint response separately. At first, each response is regressed on all its potentially explanatory variables given by their first ordering. The tables give the estimated constant term and for each variable in the regression, its estimated coefficient (coeff), the estimated standard deviation of the coefficient (s coeff), as well as the ratio z obs=coeff/s coeff. These ratios are compared with 2.58, the 0.995 quantile of a random variable Z having a standard Gaussian distribution, for which Pr(|Z|>2.58)=0.01. In backward selection steps, the variable with the smallest observed value |z obs| is deleted from a regression equation, one at a time, until the threshold is reached.

Table 1 Response: Y, success of treatment; linear regression including a quadratic term
Table 2 Response: Z a , intensity of pain after treatment; linear regression
Table 3 Response: X a , depression after treatment; linear regression
Table 4 Response: Z b , intensity of pain before; linear regression
Table 5 Response: X b , depression before; linear regression
Table 6 Response: U, chronicity of pain; linear regression
Table 7 Response: A, site of pain; logistic regression
Table 8 Response: V, previous illnesses; linear regression

The procedure defines a selected model, unless one of the excluded variables has a contribution of \(|z '_{\mathrm{obs}}|>2.58\) when added alone to the selected directly explanatory variables; then such a variable needs also to be included as an important directly explanatory variable. This did not happen in the given data set.

The tables show for linear models also R 2, the coefficient of determination, both for the full and for the selected model. Multiplied by 100, it gives the percentage of the variation in the response explained by the model.

In the linear regression of Z a on X a and on the directly explanatory variables of both Z a and X a , that is, on Z b ,X b ,A, the contribution of X a leads to z obs=3.51, which coincides—by definition—with z obs computed for the contribution of Z a in the linear regression of X a on Z a and on Z b ,X b ,A. Hence the two responses are correlated even after considering the directly explanatory variables and a dashed line joining Z a and Z b is added to the well-fitting regression graph in Fig. 8.

In the linear regression of Z b on X b and on the directly explanatory variables of both Z b and X b , that is, on U,A,V,B, the contribution of X b leads to z obs=2.64. Hence the two responses are associated after considering their directly explanatory variables and there is a dashed line joining Z b and X b in the regression graph of Fig. 8.

The relatively strict criterion, for excluding variables, assures that all edges in the derived regression graph correspond to dependences that are considered to be substantive in the given context. Had instead a 0.975 quantile been chosen as threshold, then one arrow from A to Y and another from U to X a would have been added to the regression graph. Although this would correspond to a better goodness-of-fit, such weak dependences are less likely to become confirmed as being important in follow-up studies.

The subgraph induced by Z a ,Z b ,X a ,X b of the regression graph in Fig. 8 corresponds to two seemingly unrelated regressions. Even though separate least-squares estimates can in principle be severely distorted, for the present data, the structure is so well-fitting in the unconstrained multivariate regression of Z a and X a on Z b , X b , U,V,A,B, that is, in a simple covering model, that none of these potential problems are relevant.

With C={U,V,A,B}, this is evident from the observed covariance matrix of Z a ,X a given Z b ,X b ,C, denoted here by \(\tilde{\varSigma}_{aa|bC}\) and the observed regression coefficient matrix \(\tilde{\varPi}_{a|b.C}\) being almost identical to the corresponding maximum likelihood estimators \(\hat{\varSigma}_{aa|bC}\) and \(\hat{\varPi}_{a|b.C}\).

The former can be obtained by sweeping or partially inverting the observed covariance matrix of the eight variables with respect to Z b ,X b ,C and the latter by using an adaption of the EM-algorithm, due to Kiiveri (1987), on the observed covariance matrix of the four symptoms, corrected for linear regression on C. In this way, one gets

The assumed definition of the joint distribution in terms of univariate and multivariate regressions assures that the overall fit of the model can be judged locally in two steps. First, one compares each unconstrained, full regression of a single response with regressions constrained by some independences, that is, by selecting a subset of directly explanatory variables from the list of the potentially explanatory variables. Next, one decides for each component pair of a joint response whether this pair is conditionally independent given their directly explanatory variables considered jointly. This can again be achieved by single univariate regressions, as illustrated above for the joint responses Z a and X a .

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wermuth, N., Sadeghi, K. Sequences of regressions and their independences. TEST 21, 215–252 (2012). https://doi.org/10.1007/s11749-012-0290-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-012-0290-6

Keywords

Mathematics Subject Classification

Navigation