Stratified Fisher's Exact Test and its Sample Size Calculation

Chi-squared test has been a popular approach to the analysis of a 2 × 2 table when the sample sizes for the four cells are large. When the large sample assumption does not hold, however, we need an exact testing method such as Fisher's test. When the study population is heterogeneous, we often partition the subjects into multiple strata, so that each stratum consists of homogeneous subjects and hence the stratified analysis has an improved testing power. While Mantel-Haenszel test has been widely used as an extension of the chi-squared test to test on stratified 2×2 tables with a large-sample approximation, we have been lacking an extension of Fisher's test for stratified exact testing. In this paper, we discuss an exact testing method for stratified 2 × 2 tables which is simplified to the standard Fisher's test in single 2 × 2 table cases, and propose its sample size calculation method that can be useful for designing a study with rare cell frequencies.

Keywords: Conditional type I error, Exact test, Hypergeometric distribution, Many 2 × 2 tables, Odds ratio

1 Introduction

In this paper, we discuss an exact test for stratified 2 × 2 tables with rare cell frequencies. Since the stratified exact test is simplified to the standard Fisher's (1935) exact test in single 2 × 2 table cases, we call it stratified Fisher's test.

Suppose that we want to compare the response probabilities between two groups, experimental (or case) and control. Oftentimes in a two group comparison, the characteristics of study subjects may be heterogeneous. In this case, the heterogeneity is characterized by some stratification factors, and a stratified method is applied in the final analysis. When the distribution of the stratification factors is identical between two groups, an unstratified testing ignoring the population heterogeneity controls the type I error rate but loses the efficiency. If the distribution of the stratification factors is different between two groups, however, an unstratified testing does not control the type I error rate. We want to test if the two groups have equal response probabilities or not while accounting for heterogeneity of the population defined by strata.

Multiple asymptotic testing methods have been proposed for testing on stratified 2 × 2 tables. Under the assumption that the odds ratios are identical among strata, Cochran (1954) proposes an asymptotic method for testing if the common odds ratio is 1 or not. Under the assumption of constant risk ratios across strata, Gart (1985) proposes an asymptotic method for testing if the common risk ratio is 1 or not. Woolson et al. (1986) and Nam (1992) propose sample size calculation methods for Cochran test, and Nam (1998) proposes a sample size method for Gart test.

In order to use these tests for testing on many 2 × 2 tables, we have to check the assumptions of common odds ratios or risk ratios in advance. For testing the common odds ratio assumption, Zelen (1971) proposes an exact method, which is implemented by StatXact, and Breslow and Day (1980) propose an asymptotic method.

If these assumptions do not seem to be valid, we need a robust test requiring no assumptions on the primary parameters for testing. Mantel and Haenszel (1959) propose an asymptotic test for testing if two groups have equal response probabilities without any assumption of common odds ratio or common risk ratio. Jung et al. (2007) propose a sample size calculation method for Mantel-Haenszel test. In this paper, we extend Fisher's exact test for testing stratified 2 × 2 tables with rare cell frequencies, and propose its sample size calculation method. These methods can be used in designing and analyzing small case-control studies or clinical trials. The input parameters to be specified for the sample size calculation of stratified Fisher's exact test are exactly the same as those for the sample size calculation of Mantel-Haenszel test. We will compare the performance of the proposed test with that of the asymptotic Mantel-Haenszel test and the standard Fisher's exact test ignoring strata under some practical settings.

2 Stratified Fisher's Exact Test

Suppose that there are J strata. Let N denote the total sample size, and n_j the sample size in stratum j ( ∑ j = 1 J n j = N ) . Among n_j subjects in stratum j(= 1, . J), m_j are allocated to group 1 (case or experimental) and m ̄ _j to group 2 (control). For stratum j, group 1 has a response probability p_j and group 2 has a response probability q_j. Let p ̄ _j = 1 – p_j, q ̄ _j = 1 – q_j, and θ_j = p_j q ̄ _j/(q_j p ̄ _j) denote the odds ratio in stratum j. Suppose that we want to test

H₀:θ₁ = ⋯ = θ_J = 1 H₁:θ_j > 1 for some j = 1, …, J.

For stratum j(= 1, . J), let x_j and y_j denote the numbers of responders for groups 1 and 2, respectively, and z_j = x_j + y_j denote the total number of responses. The frequency data in stratum j can be described as in Table 1 .

Table 1

Frequency data of 2 × 2 table for stratum j(= 1, . J)

Group
Response	Case	Control	Total
Yes	x_j	y_j	z_j
No	m_j – x_j	m ̄ _j – y_j	n_j – z_j
Total	m_j	m ̄ _j	n_j

We propose to reject H₀ in favor of H₁ if S = ∑ j = 1 J x j is large. Under H₀, conditioning on the margin totals (z_j, m_j, n_j), x_j has the hypergeometric distribution

f 0 ( x j ∣ z j , m j , n j ) = ( m j x j ) ( m ‒ j z j − x j ) ∑ i = m j − m j + ( m j i ) ( m ‒ j z j − i )

pv = P ( S ≥ s ∣ z , m , n , H 0 ) = ∑ i 1 = m 1 − m 1 + ⋯ ∑ i J = m J − m J + I ( ∑ j = 1 J i j ≥ s ) ∏ j = 1 J f 0 ( i j ∣ z j , m j , n j ) .

Given type I error rate α*, we reject H₀ if pv < α*.

Similarly for the other one-sided alternative hypothesis

H₂:θ_j < 1 for some j = 1, …, J,

the conditional p-value given (z, m, n) is obtained by

pv = P ( S ≤ s ∣ z , m , n , H 0 ) = ∑ i 1 = m 1 − m 1 + ⋯ ∑ i J = m J − m J + I ( ∑ j = 1 J i j ≤ s ) ∏ j = 1 J f 0 ( i j ∣ z j , m j , n j ) .

A two-sided p-value may be calculated as two times the minimum of the two one-sided p-values. Without loss of generality, we focus our discussions on the one-sided alternative hypothesis H₁ in our paper.

Note that Mantel-Haenszel test also rejects H₀ in favor of H₁ for a large value of S, and its p-value is calculated using the standardized test statistic

which is asymptotically N(0, 1) under H₀, where E = ∑ j = 1 J E j , V = ∑ j = 1 J V j , E_j = z_jm_j/n_j and V j = z j m j m ‒ j ( n j − z j ) ∕ < n j 2 ( n j − 1 ) >. Westfall, Zaykin and Young (2002) propose a permutation procedure for stratified Mantel-Haenszel test, which permutes the two-sample binary data within each stratum in the context of multiple testing. Their permutation maintains the margin totals for 2 × 2 tables, <(z_j, m_j, n_j), 1 ≤ j ≤ J>, and E_j and V_j depend on the margin totals only, so that the permutation-based Mantel-Haenszel test will be identical to our stratified Fisher's exact test if they go through all the possible ∏ j = 1 J ( m j + − m j − + 1 ) permutations. Their permutation test is implemented by SAS. Compared to our exact test, the permutation test requires a much longer computing time. Furthermore, a permutation test often randomly selects partial permutations to approximate the exact p-value. In this case, the resulting approximate p-value will be different depending on the selected seed number for random number generation or the number of permutations, while the exact method always provides a constant exact p-value.

A real data example is taken from Li et al. (1979), where the investigators are interested in whether thymosin (experimental), compared to placebo (control), has any effect in the treatment of bronchogenic carcinoma patients receiving radiotherapy. Table 2 summarizes the data for three strata. The one-sided p-values are 0.1563 by the stratified Fisher's exact test and 0.0760 by Mantel-Haenszel test. Stratified Fisher's test has a larger p-value than Mantel-Haenszel test because of its conservative type I error control as demonstrated in Section 4 or because of the very small numbers of failures across the strata that can lead to a biased p-value for the asymptotic Mantel-Haenszel test.

Table 2

Response to thymosin in bronchogenic carcinoma patients (T=thymosin, P=placebo)

Stratum 1		Stratum 2		Stratum 3
T	P	T	P	T	P
Success	10	12	22	9	11	20	8	7	15
Failure	1	1	2	0	1	1	0	3	3
11	13	24	9	12	21	8	10	18

3 Power and Sample Size Calculation

Jung et al. (2007) propose a sample size calculation method for Mantel-Haenszel test. In this section, we derive a sample size formula for stratified Fisher's exact test by specifying the values of the same input parameters as those for Mantel-Haenszel test by Jung et al. (2007). Following are input parameters to be specified for a sample size calculation.

Type I and II error probabilities: (α*, β*) Success probabilities for group 2 (control): (q₁, . q_J)

3.1 When Group and Stratum Allocations are Random

In designing a study, N is fixed at a predetermined size corresponding to a specified power. At the moment, we assume that, given N, the strata sizes and the sample sizes for two groups within each stratum are randomly selected by the prevalence rate of each category in the population. Hence, given N, <(x_j, z_j, m_j, n_j), 1 ≤ j ≤ J> are random variables with following marginal or conditional probability mass functions that are indexed by the above input parameters.

Conditional distribution of x_j given (z_j, m_j, n_j):

f j ( x j ∣ z j , m j , n j ) = ( m j x j ) ( m ‒ j z j − x j ) θ j x j ∑ i = m j − m j + ( m j i ) ( m ‒ j z j − i ) θ j i

g j ( z j ∣ m j , n j ) = ∑ x = m j − m j + ( m j x ) p j x p ‒ j m j − x ( m ‒ j z j − x ) q j z j − x q ‒ j m ‒ j − z j + x

for z = 0, 1, . n_j and j = 1, . J, where B(m, p) denotes the binomial distribution with number of trials m and success probability p. Under H₀, this is simplified to

g 0 j ( z j ∣ m j , n j ) = q j z j q ‒ j n j − z j ∑ x = m j − m j + ( m j x ) ( m ‒ j z j − x ) . Note that ( 0 0 ) p 0 ( 1 − p ) 0 = 1 for p ∈ (0, 1).

Conditional distribution of m_j given n_j: At the moment, we assume that, given a total sample size n_j of stratum j, the sample size of group 1 m_j is a binomial random variable with probability mass function

h j ( m j ∣ n j ) = ( n j m j ) b j m j ( 1 − b j ) n j − m j for 0 ≤ m_j ≤ n_j and j = 1, . J. Conditional distribution of (n₁, . n_J) given N is multinomial with probability mass function l N ( n 1 , ⋯ , n J ) = N ! ∏ j = 1 J n j ! ∏ j = 1 J a j n j for 0 ≤ n₁ ≤ N, . 0 ≤ n_J ≤ N and ∑ j = 1 J n j = N .

We first derive the power function for a given sample size N using these distribution functions. Given (z, m, n) and type I error rate α*, the critical value c_α* = c_α*(z, m, n) is the smallest integer c satisfying

P ( S ≥ c ∣ z , m , n , H 0 ) = ∑ i 1 = m 1 − m 1 + ⋯ ∑ i J = m J − m J + I ( ∑ j = 1 J i j ≥ c ) ∏ j = 1 J f 0 ( i j ∣ z j , m j , n j ) ≥ α ∗ .

Note that s ≥ c_α*(z, m, n) if and only if pv(s|z, m, n) ≤ α*. We call α(z, m, n) = P(S ≥ c_α*|z, m, n, H₀) the conditional type I error rate given (z, m, n). Similarly, the conditional power 1 – β(z, m, n) given (z, m, n) is obtained by

P ( S ≥ c α ∗ ∣ z , m , n , H 1 ) = ∑ i 1 = m 1 − m 1 + ⋯ ∑ i J = m J − m J + I ( ∑ j = 1 J i j ≥ c α ∗ ) ∏ j = 1 J f j ( i j ∣ z j , m j ) .

For a chosen N, the marginal type I error rate and power are given as

α N ≡ E < α ( z , m , n ) ∣ H 0 >= E n ( E m [ E z < α ( z , m , n ) ∣ m , n , H 0 >∣ n ] ) = ∑ n ∈ D N ∑ m 1 = 0 n 1 ⋯ ∑ m J = 0 n J ∑ z 1 = m 1 − m 1 + ⋯ ∑ z J = m J − m 1 + α ( z 1 , ⋯ , z J ; m 1 , ⋯ , m J ; n 1 ⋯ , n J ) × < ∏ j = 1 J g 0 j ( z j ∣ m j , n j ) > < ∏ j = 1 J h j ( m j ∣ n j ) >l N ( n 1 , … , n J )

1 − β N ≡ E < 1 − β ( z , m , n ) ∣ H 1 >= E n ( m [ E z < 1 − β ( z , m , n ) ∣ m , n , H 1 >∣ n ] ) = ∑ n ∈ D N ∑ m 1 = 0 n 1 ⋯ ∑ m J = 0 n J ∑ z 1 = m 1 − m 1 + ⋯ ∑ z J = m J − m J + < 1 − β ( z 1 , … , z J ; m 1 , … , m J ; n 1 , … , n J ) >× < ∏ j = 1 J g i ( z j ∣ m j , n j ) > < ∏ j = 1 J h j ( m j ∣ n j ) >l N ( n 1 , … , n J ) .

respectively, where D N = < ( n 1 , … , n J ) : 0 ≤ n 1 ≤ N , … , 0 ≤ n J ≤ N , ∑ j = 1 J n j = N >and E w (·) denotes the expected value with respect to a random vector w.

Since α(z, m, n) ≤ α* for all (z, m, n), we have α_N ≤ α*. Given power 1 – β*, the required sample size is chosen by the smallest integer N satisfying 1 – β_N ≥ 1 – β*. In other words, while the statistical testing controls the conditional type I error α(z, m, n), the sample size is determined to guarantee a specified level of marginal power. In summary, a sample size is calculated as follows.

Sample Size Calculation