Introduction

The rapidly evolving evidence on genetic associations is crucial to integrating human genomics into the practice of medicine and public health (Khoury et al. 2004; Genomics Health and Society Working Group 2004). Genetic factors are likely to affect the occurrence of numerous common diseases, and therefore identifying and characterizing the associated risk (or protection) will be important in improving the understanding of etiology and potentially for developing interventions based on genetic information. The number of publications on the associations between genes and diseases has increased tremendously; with more than 34,000 published articles, the annual number has more than doubled between 2001 and 2008 (Lin et al. 2006; Yu et al. 2008). Articles on genetic associations have been published in about 1,500 journals and in several languages.

Despite many similarities between genetic association studies and “classical” observational epidemiologic studies (that is, cross-sectional, case–control, and cohort) of lifestyle and environmental factors, genetic association studies present several specific challenges including an unprecedented volume of new data (Lawrence et al. 2005; Thomas 2006) and the likelihood of very small individual effects. Genes may operate in complex pathways with gene-environment and gene–gene interactions (Khoury et al. 2007). Moreover, the current evidence base on gene-disease associations is fraught with methodological problems (Little et al. 2003; Ioannidis et al. 2005, 2006). Inadequate reporting of results, even from well-conducted studies, hampers assessment of a study’s strengths and weaknesses, and hence the integration of evidence (von Elm and Egger 2004).

Although several commentaries on the conduct, appraisal and/or reporting of genetic association studies have so far been published (Nature Genetics 1999; Cardon and Bell 2001; Weiss 2001; Weiss et al. 2001; Cooper et al. 2002; Hegele 2002; Little et al. 2002; Romero et al. 2002; Colhoun et al. 2003; van Duijn and Porta 2003; Crossman and Watkins 2004; Huizinga et al. 2004; Little 2004; Rebbeck et al. 2004; Tan et al. 2004; Anonymous 2005; Ehm et al. 2005; Freimer and Sabatti 2005; Hattersley and McCarthy 2005; Manly 2005; Shen et al. 2005; Vitali and Randolph 2005; Wedzicha and Hall 2005; Hall and Blakey 2005; DeLisi and Faraone 2006; Saito et al. 2006; Uhlig et al. 2007; NCI-NHGRI Working Group on Replication in Association Studies et al. 2007), their recommendations differ. For example, some papers suggest that replication of findings should be part of the publication (Nature Genetics 1999; Cardon and Bell 2001; Cooper et al. 2002; Hegele 2002; Huizinga et al. 2004; Tan et al. 2004; Wedzicha and Hall 2005; Hall and Blakey 2005; DeLisi and Faraone 2006), whereas others consider this suggestion unnecessary or even unreasonable (van Duijn and Porta 2003; Begg 2005; Byrnes et al. 2005; Pharoah et al. 2005; Wacholder 2005; Whittemore 2005). In many publications, the guidance has focused on genetic association studies of specific diseases (Weiss 2001; Weiss et al. 2001; Hegele 2002; Romero et al. 2002; Crossman and Watkins 2004; Huizinga et al. 2004; Rebbeck et al. 2004; Tan et al. 2004; Manly 2005; Shen et al. 2005; Vitali and Randolph 2005; Wedzicha and Hall 2005; Hall and Blakey 2005; DeLisi and Faraone 2006; Saito et al. 2006; Uhlig et al. 2007) or the design and conduct of genetic association studies (Cardon and Bell 2001; Weiss 2001; Weiss et al. 2001; Hegele 2002; Romero et al. 2002; Colhoun et al. 2003; Crossman and Watkins 2004; Huizinga et al. 2004; Rebbeck et al. 2004; Hattersley and McCarthy 2005; Manly 2005; Shen et al. 2005; Hall and Blakey 2005; DeLisi and Faraone 2006) rather than on the quality of the reporting.

Despite increasing recognition of these problems, the quality of reporting genetic association studies needs to be improved (Bogardus et al. 1999; Peters et al. 2003; Clark and Baudouin 2006; Lee et al. 2007; Yesupriya et al. 2008). For example, an assessment of a random sample of 315 genetic association studies published from 2001 to 2003 found that most studies provided some qualitative descriptions of the study participants (for example, origin and enrollment criteria), but reporting of quantitative descriptors such as age and sex was variable (Yesupriya et al. 2008). In addition, completeness of reporting of methods that allow readers to assess potential biases (for example, number of exclusions or number of samples that could not be genotyped) varied (Yesupriya et al. 2008). Only some studies described methods to validate genotyping or mentioned whether research staff was blinded to outcome. The same problems persisted in a smaller sample of studies published in 2006 (Yesupriya et al. 2008). Lack of transparency and incomplete reporting have raised concerns in a range of health research fields (von Elm and Egger 2004; Reid et al. 1995; Brazma et al. 2001; Pocock et al. 2004; Altman and Moher 2005) and poor reporting has been associated with biased estimates of effects in clinical intervention studies (Gluud 2006).

The main goal of this article is to propose and justify a set of guiding principles for reporting results of genetic association studies. The epidemiology community has recently developed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement for cross-sectional, case–control, and cohort studies (von Elm et al. 2007; Vandenbroucke et al. 2007). Given the relevance of general epidemiologic principles for genetic association studies, we propose recommendations in an extension of the STROBE statement called the STrengthening the REporting of Genetic Association studies (STREGA) Statement. The recommendations of the STROBE Statement have a strong foundation because they are based on the empirical evidence on the reporting of observational studies, and they involved extensive consultations in the epidemiologic research community (Vandenbroucke et al. 2007). We have sought to identify gaps and areas of controversy in the evidence regarding potential biases in genetic association studies. With the recommendations, we have indicated available empirical or theoretical work that has demonstrated or suggested that a methodological feature of a study can influence the direction or magnitude of the association observed. We acknowledge that for many items, no such evidence exists. The intended audience for the reporting guideline is broad and includes epidemiologists, geneticists, statisticians, clinician scientists, and laboratory-based investigators who undertake genetic association studies. In addition, it includes “users” of such studies who wish to understand the basic premise, design, and limitations of genetic association studies in order to interpret the results. The field of genetic associations is evolving very rapidly with the advent of genome-wide association investigations, high-throughput platforms assessing genetic variability beyond common single-nucleotide polymorphisms (SNPs) (for example, copy number variants, rare variants), and eventually routine full sequencing of samples from large populations. Our recommendations are not intended to support or oppose the choice of any particular study design or method. Instead, they are intended to maximize the transparency, quality and completeness of reporting of what was done and found in a particular study.

Methods

A multidisciplinary group developed the STREGA Statement using literature review, workshop presentations and discussion, and iterative electronic correspondence after the workshop. Thirty-three of 74 invitees participated in the STREGA workshop in Ottawa, Ontario, Canada, in June, 2006. Participants included epidemiologists, geneticists, statisticians, journal editors, and graduate students.

Before the workshop, an electronic search was performed to identify existing reporting guidance for genetic association studies. Workshop participants were also asked to identify any additional guidance. They prepared brief presentations on existing reporting guidelines, empirical evidence on reporting of genetic association studies, the development of the STROBE Statement, and several key areas for discussion that were identified on the basis of consultations before the workshop. These areas included the selection and participation of study participants, rationale for choice of genes and variants investigated, genotyping errors, methods for inferring haplotypes, population stratification, assessment of Hardy–Weinberg equilibrium (HWE), multiple testing, reporting of quantitative (continuous) outcomes, selectively reporting study results, joint effects and inference of causation in single studies. Additional resources to inform workshop participants were the HuGENet handbook (Little and Higgins 2006; Higgins et al. 2007), examples of data extraction forms from systematic reviews or meta-analyses, articles on guideline development (Altman et al. 2001; Moher et al. 2001) and the checklists developed for STROBE. To harmonize our recommendations for genetic association studies with those for observational epidemiologic studies, we communicated with the STROBE group during the development process and sought their comments on the STREGA draft documents. We also provided comments on the developing STROBE Statement and its associated explanation and elaboration document (Vandenbroucke et al. 2007).

Results

In Table 1, we present the STREGA recommendations, an extension to the STROBE checklist (von Elm et al. 2007) for genetic association studies. The resulting STREGA checklist provides additions to 12 of the 22 items on the STROBE checklist. During the workshop and subsequent consultations, we identified five main areas of special interest that are specific to, or especially relevant in, genetic association studies: genotyping errors, population stratification, modeling haplotype variation, HWE, and replication. We elaborate on each of these areas, starting each section with the corresponding STREGA recommendation, followed by a brief outline of the issue and an explanation for the recommendations. Complementary information on these areas and the rationale for additional STREGA recommendations relating to selection of participants, choice of genes and variants selected, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data, and issues of data volume, are presented in Table 2.

Table 1 STREGA reporting recommendations, extended from STROBE Statement
Table 2 Rationale for inclusion of topics in the STREGA recommendations

Genotyping errors

Recommendation for reporting of methods (Table  1 , item 8(b)): Describe laboratory methods, including source and storage of DNA, genotyping methods and platforms (including the allele calling algorithm used, and its version), error rates, and call rates. State the laboratory/center where genotyping was done. Describe comparability of laboratory methods if there is more than one group. Specify whether genotypes were assigned using all of the data from the study simultaneously or in smaller batches.

Recommendation for reporting of results (Table  1 , item 13(a)): Report numbers of individuals in whom genotyping was attempted and numbers of individuals in whom genotyping was successful.

Genotyping errors can occur as a result of effects of the DNA sequence flanking the marker of interest, poor quality or quantity of the DNA extracted from biological samples, biochemical artefacts, poor equipment precision or equipment failure, or human error in sample handling, conduct of the array or handling the data obtained from the array (Pompanon et al. 2005). A commentary published in 2005 on the possible causes and consequences of genotyping errors observed that an increasing number of researchers were aware of the problem, but that the effects of such errors had largely been neglected (Pompanon et al. 2005). The magnitude of genotyping errors has been reported to vary between 0.5 and 30% (Pompanon et al. 2005; Akey et al. 2001; Dequeker et al. 2001; Mitchell et al. 2003). In high-throughput centers, an error rate of 0.5% per genotype has been observed for blind duplicates that were run on the same gel (Mitchell et al. 2003). This lower error rate reflects an explicit choice of markers for which genotyping rates have been found to be highly repeatable and whose individual polymerase chain reactions (PCR) have been optimized. Non-differential genotyping errors, that is, those that do not differ systematically according to outcome status, will usually bias associations towards the null (Rothman et al. 1993; Garcia-Closas et al. 2004), just as for other non-differential errors. The most marked bias occurs when genotyping sensitivity is poor and genotype prevalence is high (>85%) or, as the corollary, when genotyping specificity is poor and genotype prevalence is low (<15%) (Rothman et al. 1993). When measurement of the environmental exposure has substantial error, genotyping errors of the order of 3% can lead to substantial under-estimation of the magnitude of an interaction effect (Wong et al. 2004). When there are systematic differences in genotyping according to outcome status (differential error), bias in any direction may occur. Unblinded assessment may lead to differential misclassification. For genome-wide association studies of SNPs, differential misclassification between comparison groups (for example, cases and controls) can occur because of differences in DNA storage, collection or processing protocols, even when the genotyping itself meets the highest possible standards (Clayton et al. 2005). In this situation, using samples blinded to comparison group to determine the parameters for allele calling could still lead to differential misclassification. To minimize such differential misclassification, it would be necessary to calibrate the software separately for each group. This is one of the reasons for our recommendation to specify whether genotypes were assigned using all of the data from the study simultaneously or in smaller batches.

Population stratification

Recommendation for reporting of methods (Table  1 , item 12(h)): Describe any methods used to assess or address population stratification.

Population stratification is the presence within a population of subgroups among which allele (or genotype; or haplotype) frequencies and disease risks differ. When the groups compared in the study differ in their proportions of the population subgroups, an association between the genotype and the disease being investigated may reflect the genotype being an indicator identifying a population subgroup rather than a causal variant. In this situation, population subgroup is a confounder because it is associated with both genotype frequency and disease risk. The potential implications of population stratification for the validity of genetic association studies have been debated (Knowler et al. 1988; Gelernter et al. 1993; Kittles et al. 2002; Thomas and Witte 2002; Wacholder et al. 2002; Cardon and Palmer 2003; Wacholder et al. 2000; Ardlie et al. 2002; Edland et al. 2004; Millikan 2001; Wang et al. 2004; Ioannidis et al. 2004; Marchini et al. 2004; Freedman et al. 2004; Khlat et al. 2004). Modeling the possible effect of population stratification (when no effort has been made to address it) suggests that the effect is likely to be small in most situations (Wacholder et al. 2000; Ardlie et al. 2002; Millikan 2001; Wang et al. 2004; Ioannidis et al. 2004). Meta-analyses of 43 gene-disease associations comprising 697 individual studies showed consistent associations across groups of different ethnic origin (Ioannidis et al. 2004), and thus provide evidence against a large effect of population stratification, hidden or otherwise. However, as studies of association and interaction typically address moderate or small effects and hence require large sample sizes, a small bias arising from population stratification may be important (Marchini et al. 2004). Study design (case-family control studies) and statistical methods (Balding 2006) have been proposed to address population stratification, but so far few studies have used these suggestions (Yesupriya et al. 2008). Most of the early genome-wide association studies used family-based designs or such methods as genomic control and principal components analysis (Wellcome Trust Case Control Consortium 2007; Ioannidis 2007) to control for stratification. These approaches are particularly appropriate for addressing bias when the identified genetic effects are very small (odds ratio < 1.20), as has been the situation in many recent genome-wide association studies (Wellcome Trust Case Control Consortium 2007; Parkes et al. 2007; Todd et al. 2007; Zeggini et al. 2007; Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research et al. 2007; Scott et al. 2007; Helgadottir et al. 2007; McPherson et al. 2007; Easton et al. 2007; Hunter et al. 2007; Stacey et al. 2007; Gudmundsson et al. 2007; Haiman et al. 2007b; Yeager et al. 2007; Zanke et al. 2007; Tomlinson et al. 2007; Haiman et al. 2007a; Rioux et al. 2007; Libioulle et al. 2007; Duerr et al. 2006). In view of the debate about the potential implications of population stratification for the validity of genetic association studies, we recommend transparent reporting of the methods used, or stating that none was used, to address this potential problem. This reporting will enable empirical evidence to accrue about the effects of population stratification and methods to address it.

Modeling haplotype variation

Recommendation for reporting of methods (Table  1 , item 12(g)): Describe any methods used for inferring genotypes or haplotypes.

A haplotype is a combination of specific alleles at neighboring genes that tends to be inherited together. There has been a considerable interest in modeling haplotype variation within candidate genes. Typically, the number of haplotypes observed within a gene is much smaller than the theoretical number of all possible haplotypes (Zhao et al. 2003; International HapMap Consortium et al. 2007). Motivation for utilizing haplotypes comes, in large part, from the fact that multiple SNPs may “tag” an untyped variant more effectively than a single typed variant. The subset of SNPs used in such an approach is called “haplotype tagging” SNPs. Implicitly, an aim of haplotype tagging is to reduce the number of SNPs that have to be genotyped, while maintaining statistical power to detect an association with the phenotype. Maps of human genetic variation are becoming more complete, and large-scale genotypic analysis is becoming increasingly feasible. In consequence, it is possible that modeling haplotype variation will become more focussed on rare causal variants, because these may not be included in the genotyping platforms.

In most current large-scale genetic association studies, data are collected as unphased multilocus genotypes (that is, which alleles are aligned together on particular segments of chromosome is unknown). It is common in such studies to use statistical methods to estimate haplotypes (Stephens et al. 2001; Qin et al. 2002; Scheet and Stephens 2006; Browning 2008), and their accuracy and efficiency have been discussed (Huang et al. 2003; Kamatani et al. 2004; Zhang et al. 2004; Carlson et al. 2004; van Hylckama Vlieg et al. 2004). Some methods attempt to make use of a concept called haplotype “blocks” (Greenspan and Geiger 2004; Kimmel and Shamir 2005), but the results of these methods are sensitive to the specific definitions of the “blocks” (Cardon and Abecasis 2003; Ke et al. 2004). Reporting of the methods used to infer individual haplotypes and population haplotype frequencies, along with their associated uncertainties should enhance our understanding of the possible effects of different methods of modeling haplotype variation on study results as well as enabling comparison and syntheses of results from different studies.

Information on common patterns of genetic variation revealed by the International Haplotype Map (HapMap) Project (International HapMap Consortium et al. 2007) can be applied in the analysis of genome-wide association studies to infer genotypic variation at markers not typed directly in these studies (Servin and Stephens 2007; Marchini et al. 2007). Essentially, these methods perform haplotype-based tests but make use of information on variation in a set of reference samples (for example, HapMap) to guide the specific tests of association, collapsing a potentially large number of haplotypes into two classes (the allelic variation) at each marker. It is expected that these techniques will increase power in individual studies, and will aid in combining data across studies, and even across differing genotyping platforms. If imputation procedures have been used, it is useful to know the method, accuracy thresholds for acceptable imputation, how imputed genotypes were handled or weighted in the analysis, and whether any associations based on imputed genotypes were also verified on the basis of direct genotyping at a subsequent stage.

Hardy–Weinberg equilibrium

Recommendation for reporting of methods (Table  1 , item 12(f)): State whether HWE was considered and, if so, how.

Hardy–Weinberg equilibrium has become widely accepted as an underlying model in population genetics after (Hardy 1908) and (Weinberg 1908) proposed the concept that genotype frequencies at a genetic locus are stable within one generation of random mating; the assumption of HWE is equivalent to the independence of two alleles at a locus. Views differ on whether testing for departure from HWE is a useful method to detect errors or peculiarities in the data set, and also the method of testing (Minelli et al. 2008). In particular, it has been suggested that deviation from HWE may be a sign of genotyping errors (Xu et al. 2002; Hosking et al. 2004; Salanti et al. 2005). Testing for departure from HWE has a role in detecting gross errors of genotyping in large-scale genotyping projects such as identifying SNPs for which the clustering algorithms used to call genotypes have broken down (Wellcome Trust Case Control Consortium 2007; Pearson and Manolio 2008). However, the statistical power to detect less important errors of genotyping by testing for departure from HWE is low (McCarthy et al. 2008) and, in hypothetical data, the presence of HWE was generally not altered by the introduction of genotyping errors (Zou and Donner 2006). Furthermore, the assumptions underlying HWE, including random mating, lack of selection according to genotype, and absence of mutation or gene flow, are rarely met in human populations (Shoemaker et al. 1998; Ayres and Balding 1998). In five of 42 gene-disease associations assessed in meta-analyses of almost 600 studies, the results of studies that violated HWE significantly differed from the results of studies that conformed to the model (Trikalinos et al. 2006). Moreover, the study suggested that the exclusion of HWE-violating studies may result in loss of the statistical significance of some postulated gene-disease associations and that adjustment for the magnitude of deviation from the model may also have the same consequence for some other gene-disease associations. Given the differing views about the value of testing for departure from HWE and about the test methods, transparent reporting of whether such testing was done and, if so, the method used, is important for allowing the empirical evidence to accrue.

For massive-testing platforms, such as genome-wide association studies, it might be expected that many false-positive violations of HWE would occur if a lenient P value threshold were set. There is no consensus on the appropriate P value threshold for HWE-related quality control in this setting. Hence, we recommend that investigators state which threshold they have used, if any, to exclude specific polymorphisms from further consideration. For SNPs with low minor allele frequencies, substantially more significant results than expected by chance have been observed, and the distribution of alleles at these loci has often been found to show departure from HWE.

For genome-wide association studies, another approach that has been used to detect errors or peculiarities in the data set (due to population stratification, genotyping error, HWE deviations or other reasons) has been to construct quantile–quantile (Q/Q) plots whereby observed association statistics or calculated P values for each SNP are ranked in order from smallest to largest and plotted against the expected null distribution (Pearson and Manolio 2008; McCarthy et al. 2008). The shape of the curve can lend insight into whether or not systematic biases are present.

Replication

Recommendation: state if the study is the first report of a genetic association, a replication effort, or both (Table  1 , item 3).

Articles that present and synthesize data from several studies in a single report are becoming more common. In particular, many genome-wide association analyses describe several different study populations, sometimes with different study designs and genotyping platforms, and in various stages of discovery and replication (Pearson and Manolio 2008; McCarthy et al. 2008). When data from several studies are presented in a single original report, each of the constituent studies and the composite results should be fully described. For example, a discussion of sample size and the reason for arriving at that size would include clear differentiation between the initial group (those that were typed with the full set of SNPs) and those that were included in the replication phase only (typed with a reduced set of SNPs) (Pearson and Manolio 2008; McCarthy et al. 2008). Describing the methods and results in sufficient detail would require substantial space in print, but options for publishing additional information on the study online make this possible.

Discussion

The choices made for study design, conduct and data analysis potentially influence the magnitude and direction of results of genetic association studies. However, the empirical evidence on these effects is insufficient. Transparency of reporting is, thus, essential for developing a better evidence base (Table 2). Transparent reporting helps address gaps in empirical evidence (Bogardus et al. 1999), such as the effects of incomplete participation and genotyping errors. It will also help assess the impact of currently controversial issues such as population stratification, methods of inferring haplotypes, departure from HWE and multiple testing on effect estimates under different study conditions.

The STREGA Statement proposes a minimum checklist of items for reporting genetic association studies. The statement has several strengths. First, it is based on existing guidance on reporting observational studies (STROBE). Second, it was developed from discussions of an interdisciplinary group that included epidemiologists, geneticists, statisticians, journal editors, and graduate students, thus reflecting a broad collaborative approach in terminology accessible to scientists from diverse disciplines. Finally, it explicitly describes the rationale for the decisions (Table 2) and has a clear plan for dissemination and evaluation.

The STREGA recommendations are available at www.strega-statement.org. We welcome comments, which will be used to refine future versions of the recommendations. We note that little is known about the most effective ways to apply reporting guidelines in practice, and that therefore it has been suggested that editors and authors collect, analyze, and report their experiences in using such guidelines (Davidoff et al. 2008). We consider that the STREGA recommendations can be used by authors, peer reviewers and editors to improve the reporting of genetic association studies. We invite journals to endorse STREGA, for example by including STREGA and its Web address in their Instructions for Authors and by advising authors and peer reviewers to use the checklist as a guide. It has been suggested that reporting guidelines are most helpful if authors keep the general content of the guideline items in mind as they write their initial drafts, then refer to the details of individual items as they critically appraise what they have written during the revision process (Davidoff et al. 2008). We emphasize that the STREGA reporting guidelines should not be used for screening submitted manuscripts to determine the quality or validity of the study being reported. Adherence to the recommendations may make some manuscripts longer, and this may be seen as a drawback in an era of limited space in a print journal. However, the ability to post information on the Web should alleviate this concern. The place in which supplementary information is presented can be decided by authors and editors of the individual journal.

We hope that the recommendations stimulate transparent and improved reporting of genetic association studies. In turn, better reporting of original studies would facilitate the synthesis of available research results and the further development of study methods in genetic epidemiology with the ultimate goal of improving the understanding of the role of genetic factors in the cause of diseases.