Notation
Let
N denote the observed number of matched pairs of binomial events A and B—where the possible outcomes are referred to as success (1) or failure (2)—and let (
Y
i1,
Y
i2) denote the outcome of the
ith pair. The observed data may be summarized in a 2×2 contingency table, as in Table
3. Each
n
k
l
for
k,
l=1,2 corresponds to the number of event pairs (
Y
i1,
Y
i2) with outcomes
Y
i1=
k and
Y
i2=
l. Let
p
k
l
denote the joint probability that
Y
i1=
k and
Y
i2=
l, which we assume independent of
i. Following the notation in Agresti
6, pp. 418–420, this is a marginal or a population-averaged model. We denote the probabilities of success for events A and B—or equivalently, the marginal probabilities that
Y
i1=1 and
Y
i2=1—by
p
1+ and
p
+1, respectively. The null hypothesis of interest is
H
0:
p
+=
p
+1. The alternative hypothesis is
H
1:
p
1+≠
p
+1.
Table 3
The observed counts (and joint outcome probabilities) of a paired
2 × 2
table
Event A | Success |
n
11(p
11) |
n
12(p
12) |
n
1+(p
1+) |
| Failure |
n
21(p
21) |
n
22(p
22) |
n
2+(p
2+) |
| Sum |
n
+1(p
+1) |
n
+2(p
+2) |
N (1) |
It might, however, be more realistic to assume that
p
k
l
also depends on the subject
i. As denoted by Agresti
6, pp. 418–420, this is a subject-specific model. Further, this is a conditional model, since we are interested in the association within the pair, conditioned on the subject. Data from
N matched pairs are then presented in
N 2×2 tables, one for each pair. Collapsing over the pairs results in Table
3. Conditional independence between
Y
1 and
Y
2 is tested by the Mantel-Haenszel statistic [
6, p.417]. But that test statistic is algebraically equal to the squared McNemar test statistic. In the following, we will not specify whether we test for marginal homogeneity or conditional independence.
The asymptotic McNemar test
The asymptotic McNemar test conditions on the number of discordant pairs (
n
12+
n
21). Conditionally,
n
12 is binomially distributed with parameters
n=
n
12+
n
21 and
p=1/2 under the null hypothesis. The asymptotic McNemar test statistic [
7], which is the score statistic for testing marginal homogeneity, is
(1)
and its asymptotic distribution is the standard normal distribution. The equivalent McNemar test statistic χ
2=z
2=(n
12−n
21)2/(n
12+n
21) is approximately chi-squared distributed with one degree of freedom under the null hypothesis. The asymptotic McNemar test is undefined when n
12=n
21=0.
The McNemar exact conditional test
The test statistic in (1) measures the strength of the evidence against the null hypothesis. If we, as in the derivation of the asymptotic test, condition on the number of discordant pairs (
n
12+
n
21
), we can use the simple test statistic
n
12 to derive an exact conditional test. The conditional probability under
H
0 of observing any outcome
x
12 given
n=
n
12+
n
21 discordant pairs is the binomial point probability
(3)
The McNemar exact conditional one-sided
p-value is obtained as a sum of probabilities:
(4)
and the two-sided p-value equals twice the one-sided p-value. If n
12=(n
12+n
21)/2, the p-value equals 1.0. The exact conditional test is guaranteed to have type I error rates not exceeding the nominal level.
The McNemar mid-ptest
A mid-
p-value is obtained by first subtracting half the point probability of the observed
n
12 from the exact one-sided
p-value, then double it to obtain the two-sided mid-
p-value [
4]. Hence, the McNemar mid-
p-value equals
(5)
where
f is the probability function in (3). If
n
12=
n
21, substitute (5) with
(6)
The type I error rates of the mid-
p test—as opposed to those of exact tests—are not bounded by the nominal level; however, in a wide range of designs and models, both mid-
p tests and confidence intervals violate the nominal level rarely and with low degrees of infringement [
11‐
13]. Because mid-
p tests are based on exact distributions, they are sometimes called quasi-exact [
14]. Additional file
1 provides details on how to calculate the McNemar mid-
p test with several standard software packages.
An exact unconditional test
The tests in the previous sections did not used the concordant pairs of observations (
n
11 and
n
22) in their calculations. The unconditional approach is to consider all possible tables with
N pairs and thereby use information from all observed pairs, including the concordant ones. The exact unconditional test attributed to Suissa and Shuster [
15] uses the McNemar test statistic (1). Let
z
obs be the observed value, and let
(7)
where
x=(
x
11,
x
12,
x
21,
x
22) denotes a possible outcome with
N pairs, and let
n=
x
12+
x
21. If, for a one-sided test,
z
obs≥0, the potential outcomes that provide at least as much evidence against the null hypothesis as the observed outcome—namely those with
z(
x)≥
z
obs—are the pairs (
x
12,
n) in the region
(8)
where
h(
n)=0.5·(
z
obs
n
1/2+
n). Under the null hypothesis, the triplets (
x
12,
n,
N−
n) are trinomially distributed with parameters
N and (
p/2,
p/2,1−
p), and the attained significance level is
(9)
where
p is the probability of a discordant pair (a nuisance parameter). We eliminate the nuisance parameter by maximizing
P(
p) over the range of
p. After simplifying (9), we get the following expression for the exact unconditional one-sided
p-value [
15]:
(10)
where
,
F
n
is the cumulative binomial distribution function with parameters (
n,1/2),
i
n
=int{
h(
n)}, and int is the integer function. Suissa and Shuster [
15] outline a numerical algorithm to find the supremum in (10). If
z
obs<0, the one-sided
p-value is found by reversing the inequality in (8). The two-sided
p-value equals twice the one-sided
p-value.
Evaluation of the tests
To compare the performances of the five tests, we carried out an evaluation study of type I error rates and power. We used complete enumeration (rather than stochastic simulations) and a large set of scenarios. Each scenario is characterized by fixed values of N (the number of matched pairs), p
1+ and p
+1 (the probabilities of success for each event), and θ=p
11
p
22/p
12
p
21. θ can be interpreted as the ratio of the odds for the event Y
2 given Y
1. We use θ as a convenient way to re-parameterize {p
11,p
12,p
21,p
22} into {p
1+,p
+1,θ}, which includes the parameter of interest, namely the two marginal success probabilities. We used StatXact PROCs for SAS (Cytel Inc.) to calculate p-values of the exact unconditional test and Matlab R2011b (Mathworks Inc.) to calculate p-values of the four other tests and to perform the evaluation study. In cases where n
12=n
21=0, we set p=1 for the two asymptotic McNemar tests.
For the calculations of type I error rates, we used 19 values of N (10, 15, 20, …, 100), five values of θ (1.0, 2.0, 3.0, 5.0, 10.0), and 101 values of p
1+=p
+1 (0.00, 0.01, 0.02, …, 1.00), a total of 9595 scenarios. The nominal significance level was 5%.
Power was calculated for N=1, 2, …, 100, θ=1.0, 2.0, 3.0, 5.0, 10.0, p
1+=0.1, 0.35, 0.6, and Δ=p
+1−p
1+=0.10, 0.15, 0.20, 0.25, 0.30, 0.35.