Preprocessing of genotype data
We analyzed SNPs from chromosome 3 only. At each of the SNPs, we performed Pearson's chi-squared test for the Hardy-Weinberg equilibrium using 142 unrelated individuals. We excluded SNPs that yielded a p-value smaller than 10−4 from our analysis. In the gene-dropping test, we excluded SNPs with estimated minor allele frequency (MAF) smaller than 0.001.
Preprocessing of phenotype data
We focused on the analysis of the quantitative trait systolic blood pressure (SBP) in the simulated data set 1. The true simulation model was known to us [
4]. When testing association between genotype doses and trait values (see later discussion), we include factors AGE, SEX, and AGE by SEX interaction as covariates (
's in equation [
1]). Including BPMED as a covariate will overcompensate because BPMED is a consequence of SBP level. Instead, we estimated the effect of BPMED from a regression model with only individuals with hypertension. Because BPMED was randomly assigned to individuals with hypertension, the BPMED effect estimated this way will not be biased by its correlation with SBP. We then adjusted the trait values
by subtracting the estimated BPMED effect.
Score tests of genotype-phenotype association using unrelated individuals
At locus
, we consider a quantitative trait model
(1)
and test the null hypothesis
. In equation (
1),
is the vector of trait values (SBP adjusted for the BPMED effect),
is a constant vector of baseline mean trait values, coefficients
represent the effects of the covariates
(e.g., AGE, SEX and AGE by SEX interaction) on trait values,
is the vector of genotype doses (the number of minor alleles possessed by each individual) at locus
, and the coefficient
represents the effect size of a single allele. The fitted value of
will reflect the collective effect of all causal SNPs that are in linkage disequilibrium (LD) with the test SNP
[
5].
Let
and
be the vectors of fitted values after regressing the
and
on measured covariates
's. The score statistic [
6,
7] for testing genotype-trait association at a single SNP
is
, where
is the vector of residuals. Under the null hypothesis of no association, the variance of
is estimated by
(2)
where
and
is the sample variance of the residual trait values (
is a vector of ones) [
6]. To test association,
is compared with a
distribution.
Family-based association test by gene dropping
When related individuals are used to compute the score test statistic , components of can be dependent, and the variance estimator (2) is no longer valid. One can account for correlations between components in by simulating the null distribution of using gene dropping. We now derive the analytical mean and variance of under the gene-dropping setting. In the score test using unrelated individuals, we treat as random, and can be viewed as either random or fixed. In a gene-dropping simulation, is held fixed, and is random.
Let
index individuals
and let
and
. The expected value of
is
and
, where
is 1 if the paternal (maternal) allele is the minor allele and 0 otherwise. So
is twice the MAF
at SNP
and is the same for all individuals and thus
because
's are residuals from a linear regression model with intercept. The variance of
is
The
th element in
is
are all Bernoulli random variables with probability
, and any two of them are identical if the corresponding alleles are identity-by-descent (IBD) and are independent otherwise [
8]. Let
be the number of IBD pairs among the four pairs of alleles
. The value of
at locus
is determined by the inheritance vector
, which summarizes whether the paternal or the maternal allele is passed from the parent to the child in each meiosis [
9]. Given the inheritance vector
,
(e.g.,
if
and
correspond to IBD alleles and
if
and
correspond to non-IBD alleles). In a gene-dropping simulation, the inheritance vector
is randomly sampled among all possible inheritance vectors. The expected number of IBD alleles shared between
and
,
over all possible inheritance vectors is four times the kinship coefficient
. The kinship coefficients are determined by pedigree structures. The expected value of
in a gene-dropping simulation is thus
Letting
be the matrix of IBD counts and
be the matrix of kinship coefficients, we can rewrite the above as:
where
is a matrix of all ones. Because
for residuals from a linear regression model with an intercept, the variance of
under gene dropping is
if unconditional on the inheritance vector
, and is
if conditional on the inheritance vector
(holding
fixed). We can approximate the gene-dropping null distribution of
by a normal distribution with mean
and variance
, and compute the gene-dropping
p-value by comparing
with a
distribution. To test association in the presence of linkage, one needs to condition on the inheritance
vector at
[
3] and use
. In practice,
is not observable, but we estimate
by drawing Markov chain Monte Carlo (MCMC) samples of
based on observed genotypes in the pedigrees using MORGAN (
http://www.stat.washington.edu/thompson/Genepi/MORGAN/Morgan.shtml) [
10].