时间:2023-09-15


在最新一期的《新英格兰医学杂志》里,编委会写了一篇评论《New Guidelines for Statistical Reporting in theJournal》,宣布弱化P值在多重比较中的地位。

最近几年,P值广受争议。现代生物统计奠基人Fisher于20世纪代提出P 值的概念,认为在原假设成立的情况下,如果当前统计量以及更极端值发生概率小于二十分之一时,拒绝我们的无效假设。然后,一不小心,成为主流。这P值的概念及其对其的应用已经接近百年,目前主导绝大多数应用统计研究。

但是,对P值的质疑声持续不断。为什么一定是0.05,如果是0.06就没有统计学意义了吗?0.05的设定值是不是偏大了?近年来,最具有代表性的是美国统计协会关于P值和统计学意义的讨论,以及其在美国统计学杂志发表"move to a world beyond “p < 0.05”,建议放弃P值.此外,DanielBenjamin等人发表论文支持将P值设定为0.005;当代著名流行病学家,《现代流行病学》作者Rothman KJ,建议用置信区间代替P值。




“Thenew guidelines discuss many aspects of the reporting of studies in theJournal, including a requirement to replace P values with estimatesof effects or association and 95% confidence intervals when neither theprotocol nor the statistical analysis plan has specified methods used to adjustfor multiplicity”





The Methodssection of all manuscripts should contain a brief description of sample sizeand power considerations for the study, as well as a brief description of themethods for primary and secondary analyses.


The Methods section of all manuscripts should include adescription of how missing data have been handled. Unless missingness is rare,a complete case analysis is generally not acceptable as the primary analysisand should be replaced by methods that are appropriate, given the missingnessmechanism. Multiple imputation or inverse probability case weights can be usedwhen data are missing at random; model-based methods may be more appropriate when missingness may be informative. For theJournal’sgeneralapproach to the handling of missing data in clinical trials please seeWareet al(N Engl J Med ;367:1353–1354).

所有稿件的“方法部分”应该告知如何处理缺失数据的。除非缺失非常罕见,否则只分析完整信息病例的研究是无法接受的。在这种情况下,应该基于缺失数据的机制来进行数据填补。多重填补或者逆向概率加权法可以用来填补随机缺失数据。如果缺失数据具有一定规律性(比如非随机缺失),应该采用模型的方法来进行填补。如何处理缺失数据,可见本刊的方法学文章Wareet al(N Engl J Med;367:1353–1354).

Significance tests should be accompanied by confidence intervals for estimated effect sizes, measures of association, or other parameters of interest. The confidence intervals should be adjusted to matchany adjustment made to significance levels in the corresponding test.


Unless one-sided tests are required by study design, suchas in noninferiority clinical trials, all reported P values should betwo-sided. In general, P values larger than 0.01 should be reported to twodecimal places, and those between 0.01 and 0.001 to three decimal places; Pvalues smaller than 0.001 should be reported as P<0.001. Notable exceptionsto this policy include P values arising from tests associated with stopping rules in clinical trials or from genome-wide association studies.


Results should be presented with no more precision thanis of scientific value and is meaningful given the available sample size. Forexample, measures of association, such as odds ratios, should ordinarily be reported to two significant digits. Results derived from models should be limited to the appropriate number of significant digits.


For clinicaltrials:临床试验的特殊要求:

Original and final protocols and statistical analysis plans (SAPs) should be submitted along with the manuscript, as well as a table of amendments made to the protocol and SAP indicating the date of the change and its content.


The analyses of the primary outcome in manuscripts reporting results of clinical trials should match the analyses prespecified in the original protocol, except in unusual circumstances. Analyses that do not conform to the protocol should be justified in the Methods section of the manuscript. The editors may ask for additional analyses that are not specified in the protocol。


When comparing outcomes in two or more groups in confirmatory analyses, investigators should use the testing procedures specified in the protocol and SAP to control overall type I error — for example, Bonferroni adjustments or prespecified hierarchical procedures. Pvalues adjusted for multiplicity should be reported when appropriate and labeled as such in the manuscript. In hierarchical testing procedures, P values should be reported only until the last comparison for which the P value wasstatistically significant. P values for the first nonsignificant comparison andfor all comparisons there after should not be reported. For prespecified exploratory analyses, investigators should use methods for controlling false discovery rate described in the SAP — for example, Benjamini–Hochberg procedures.

在验证性分析中,如果要进行多组比较,研究者应该采用方案和统计分析计划所设计的控制一类错误的方法,比如Bonferroni adjustments 或事先制定的多层次比较方法(例如序贯比较或者Dunntt 检验)。多重比较的P值应该汇报出来.如果采用分层次的多重比较方法,应该只报最后一次有统计学意义的P值。第一次没有统计学意义的P值,以及接下来的两两比较都不用汇报了。(按这句话什么意思呢,临床试验验证性两两比较,可能根据研究设计,会按照顺序来,比如比较三组,先第一组和第二组比较,如果有意义,再比较第一次和第三组,如果没有意义,那么第二组和第三组不再进行比较了。因此只报到最后一次有统计学意义的)

When no method to adjust for multiplicity of inferences or controlling false discovery rate was specified in the protocol or SAP of aclinical trial, the report of all secondary and exploratory endpoints should belimited to point estimates of treatment effects with 95% confidence intervals. In such cases, the Methods section should note that the widths of the intervalshave not been adjusted for multiplicity and that the inferences drawn may notbe reproducible. No P values should be reported for these analyses.

如果临床试验统计分析计划中,没有写清楚多重比较时候采用何种的方法来调整一类错误,或者控制false discoveryrate,那么报告的所有次要和探索性结果中,只能报告处理效应和95%置信区间。在这些情况下,“方法部分”要注意置信区间不要去调整检验水准,不要用P值来报告结果(这个是柳叶刀杂志最新版的重要修改,非预先设计的统计学方法,不再推荐报告P值)

Please seeWanget al(N Engl J Med ;357:2189–2194) on recommended methods for analyzing subgroups. When the SAP prespecifies an analysis of certain subgroups, that analysis should conform to the method described in the SAP. Ifthe study team believes a post hoc analysis of subgroups is important, the rationale for conducting that analysis should be stated. Post hoc analyses should be clearly labeled as post hoc in the manuscript.

请注意Wang et al(NEngl J Med ;357:2189–2194) 建议的亚组分析方法。当然统计分析计划事先计划进行某一亚组分析的时候,所有的分析应该必须遵从。如果研究团队认为事后有必要进行无设计的亚组分析,那么必须阐明合理的理由,而且在报告中必须说明哪些是事后分析的结果。

Forest plots are often used to present results from ananalysis of the consistency of a treatment effect across subgroups of factorsof interest. Such plots can be a useful display of estimated treatment effects across subgroups, and the editors recommend that they be included for important subgroups. If subgroups are small, however, formal inferences about the homogeneity of treatment effects may not be feasible. A list of P values for treatment by subgroup interactions is subject to the problems of multiplicity and has limited value for inference. Therefore, in most cases, no P values for interaction should be provided in the forest plots.


If significance tests of safety outcomes (when notprimary outcomes) are reported along with the treatment-specific estimates, no adjustment for multiplicity is necessary. Because information contained in thesafety endpoints may signal problems within specific organ classes, the editors believe that the type I error rates larger than 0.05 are acceptable. Editorsmay request that P values be reported for comparisons of the frequency of adverse events among treatment groups, regardless of whether such comparisons were prespecified in the SAP.


When possible, the editors prefer that absolute eventcounts or rates be reported before relative risks or hazard ratios. The goal isto provide the reader with both the actual event frequency and the relative frequency. Odds ratios should be avoided, as they may overestimate the relative risks in many settings and be misinterpreted.


Authors should provide a flow diagram in CONSORT format. The editors also encourage authors to submit all the relevant informationincluded in the CONSORT checklist. Although all of this information may not bepublished with the manuscript, it should be provided in either the manuscriptor a supplementary appendix at the time of submission. The CONSORT statement, checklist, and flow diagram are available on theCONSORTwebsite.


For observational studies:观察性研究特别要求:

The validity offindings from observational studies depends on several important assumptions,including those relating to sample selection, measured and unmeasured confounding, and the adequacy of methods used to control for confounding. The Methods section of observational studies should describe how these and other relevant issues were managed in the design and analysis.


If an observational study included a prespecified SAP with a description of hypotheses to be tested, a signed and dated version ofthat plan should be included with the manuscript submission. TheJournalencourages authors to deposit SAPs for observational studies in one of the online repositories designed for this purpose.


When appropriate, observational studies should use prespecified accepted methods for controlling family-wise error rate or false discovery rate when multiple tests are conducted. In manuscripts reporting observational studies without a prespecified method for error control, summary statistics should be limited to point estimates and 95% confidence intervals.In such cases, the Methods section should note that the widths of the intervalshave not been adjusted for multiplicity and that the inferences drawn from the inferences may not be reproducible. No P values should be reported for these analyses.

如果可以的话,观察性研究如果要进行多重比较,应该采用事先设定好的方法来控制family-wiseerror rate 或false discovery rate,如果没有事先进行设计,而多重比较方法分析时,所有结果只能报告估计值和置信区间。同样P值是不应该报告出来的。

If no prespecified analysis plan exists, the Methods section should provide an outline for the planned method of analysis, including

oEligibility criteria for the selection of cases and method of sampling from the data, with a diagram as appropriate.

oA description of the association or causal effect to be estimated and the rationale for this choice.

oThe prespecified method of analysis to draw inference about treatment or exposure effect or association.





Studies reporting the effect of a treatment or exposure should show the distribution of potential confounders and other variables, stratified by exposure or intervention group. When the analysis depends on the confounders being balanced by exposure group, differences between groups shouldbe summarized with point estimates and 95% confidence intervals when appropriate.


Complex models and their diagnostics can often be best described in a supplementary appendix. Authors are encouraged to conduct ananalysis that quantifies potential sensitivity to bias from unmeasured confounding; absent that, authors must provide a discussion of potential biases induced by unmeasured confounders.


Authors are encouraged to retest findings in a similarbut independent study or studies to assess the robustness of their findings.



