Genomic analysis of trios: relatedness disequilibrium regression (RDR)
The RDR method [
25] allows the estimation of parent and child genetic effects on traits. This is achieved by extending a standard genomic method for estimating heritability—single-component GREML (Genomic-Relatedness based restricted Maximum-Likelihood) [
33]—to include individuals’ parents. Standard GREML estimates the variance explained by common SNPs by comparing a matrix of pairwise genomic similarity for unrelated individuals across genotyped SNPs to a matrix of their pairwise phenotypic similarity, using a random-effects mixed linear model. Instead of using the random variation in genetic similarity among unrelated individuals, RDR estimates heritability by capitalising on the random variation in genetic similarity between pairs of individuals conditional on their parents’ genetic similarity, which arises through random segregation of alleles when gametes are formed, and is independent of environmental factors.
There are two versions of RDR: one uses identity-by-descent (IBD) relatedness, which distinguishes parts of the genome that are inherited from common ancestors, and the other uses common SNP-based relatedness. We used the SNP version since it has similar properties to the IBD version but has greater statistical power [
25]. Rather than estimating a single genetic variance component, RDR estimates three. The first estimates the
direct effect of children’s own genetic variation on their trait. This is independent of the effect of being reared by biological parents. Importantly, a direct genetic effect is only direct in the sense that it does not stem from another individual’s genotype. Notably, mechanisms by which individuals evoke and select environments based on their genotype are essential in how genes lead to phenotypes [
34], and these are included in estimates of direct genetic influence.
The second variance component estimates the effect of parent genetics on the child trait, controlling for child genetic effects: ‘genetic nurture’. Any parent genetic effect over and above child-driven direct effects must be an
indirect genetic effect, where parents’ genetics affect child traits by influencing parent behaviours and the rearing environment they provide. Notably, it is assumed that genetic nurture effects are from parents (not siblings) and that mating in the population is random. To the extent that these assumptions do not hold, the genetic nurture variance will be biased. Non-random mating would magnify the genetic nurture variance because it induces correlations between causal alleles across the genome, most importantly between transmitted and non-transmitted parts of the parental genomes [
26].
The third component captures variance in the offspring phenotype attributable to covariance between the direct and nurturing genetic effects. This somewhat abstract variance component is easier to understand when considering the conditions for the estimate to be zero. Specifically, the direct-nurturing genetic covariance would not explain any phenotypic variation if only one generation contributes genetic effects to the child trait, or if different SNPs contribute to child and parent genetic influences, such that loci have only either direct or indirect effects. Covariance between direct and nurturing genetic effects can be thought of as a ‘passive gene-environment correlation’. This refers to a magnification of the environmental effect of genetically influenced parent behaviour, which happens because children passively inherit and are directly influenced by that same genetic material (see Additional file
1 Figure S1 for detail on this concept).
Finally, the residual component captures environmental effects on the trait of interest that are not correlated with measured parent genetic variation, the effects of variants not tagged by genotyped SNPs (e.g. rarer), and measurement error.
In practice, the variance components are estimated by regressing phenotypic resemblance on three genomic relatedness matrices simultaneously. The first is similar to the matrix used in standard GREML: the genome-wide genetic relatedness of the children in the sample. The second and third represent the genetic relatedness of the parents and the genetic covariance between children and parents.
Notably, the genotypes of mothers and fathers are combined to allow estimation of the effect of
both parents. We calculated parental genotypes by summing the unnormalised maternal and paternal genotype matrices. We then standardised parental genotypes to have a mean zero and variance two. In an outbred random-mating population, the variance for the parental genotypes is twice that of the offspring genotype as it is the sum of maternal and paternal genotypes [
25]. Notably, the summing of maternal and paternal genotypes contrasts to a similar model, M-GCTA [
35], which does not involve paternal data but estimates the effects of the mother and child genomes and of their covariance.
All RDR analyses included 10 ancestry principal components and genotyping batch, both derived from the child generation, as covariates. Analyses were performed in the GCTA software. We used the --reml-no-constrain flag to allow components to take negative as well as positive values, given the theoretical and empirical evidence for negative covariance between direct and indirect genetic effects on complex traits. This can happen if a proportion of parental genetic variants associated with lower child trait scores are associated with higher child trait scores when present in the child genome. For example, studies have identified loci exert opposing maternal and fetal effects on human birth weight [
36,
37].
To test whether any genetic nurture effect was partially explained by parent anxiety and depression symptoms, we re-ran the RDR models adding a measure of stable maternal anxiety and depression symptoms as a covariate. As mentioned above, this longitudinally-derived measure is preferable to time-specific measures as it captures a more reliable core trait that children are consistently exposed to. In addition, the measure maximises sample size and minimises bias in maternal reporting due to contemporaneous collection of maternal self- and child-report. We also tested the individual time-specific maternal measures as covariates.
To test whether any of the variance components were biased because some children were genotyped in a different batch to their parents, we re-ran the RDR models using only individuals who were genotyped as a complete trio (90% of the analysis sample).
To test the sensitivity of the genetic nurture estimate to a more stringent exclusion of relatives, we restricted all GRMs to relatedness < 0.1 and re-ran the RDR models.
Finally, to estimate standard SNP heritability, we ran single component GREML models using unrelated child genotypes. This is equivalent to running RDR with the genetic nurture and direct-nurturing genetic covariance components set to zero.
Classical pedigree modelling
To compare RDR results to a traditional quantitative genetic (non-genomic) design, we implemented a univariate pedigree model [
38]. As in the classic twin design, this model allows estimation of genetic, shared environmental, and non-shared environmental (residual) influences on anxiety and depression symptoms at age 8. The model used phenotypic correlations among twins, siblings, half-siblings and cousins in the child generation, to derive estimates based on the following specifications: genetic correlations (assumed, not directly measured) are 1.00, 0.50, 0.25 and 0.125 for identical twins, non-identical twins/siblings, maternal half-siblings and cousins, respectively, and shared environmental correlations are 1.00 for all siblings and 0.00 for cousins. Sample sizes for pairs of relatives with available 8-year anxiety and depression data were 233 identical twins, 11,375 non-identical twins and siblings, 175 maternal half-siblings and 15,227 cousins, giving an overall sample of 27,010 pairs.