Methods
We assessed the change in CoE between the original and updated Cochrane systematic reviews, which reported rating of CoE as per the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for critical appraisal of medical evidence.6 We used GRADE as this has been widely recognized as the most advanced system for operationalization of fundamental principles of EBM and critical evaluation of medical evidence. 1,7,8 GRADE was developed in the first decade of 21the Century after critical appraisal of 106 systems for rating the quality of medical research evidence showed that none of them was capable of distinguishing low from high quality evidence.1,9,10
We focused on the assessment of systematic reviews, rather on individual trials, because the second important EBM principle is that assessment of the true effects of health interventions is best accomplished by evaluating total evidence on the topic rather than based on a study selected to favor a particular claim.1 GRADE is also considered a suitable method to asses certainty of evidence at the level of systematic review/meta-analysis.8 Thus, the unit of our analysis was a systematic review/meta-analysis (SR/MA).
Cochrane Reviews are regularly updated providing a unique opportunity to assess when and whether the assessment of CoE changes between the original and updated reviews as a result of new evidence generated between two reviews. Since 2013 Cochrane Reviews have mandated the use of GRADE Summary of Findings (SoF)11 to summarize CoE and magnitude effects of interventions that the reviews assessed. We evaluated all Cochrane reviews published in the last 5 years in the Cochrane Database of Systematic Reviews [ https://www.cochranelibrary.com/cdsr/about-cdsr].
We used SoFs from the original and updated reviews to extract data for the primary outcome related to CoE and to assess the magnitude and direction of effect. (In case of multiple primary outcomes, the data were extracted from the first one listed in SoF table that contained data in both original and updated review). Eligible SR/MAs were divided into 5 groups; data were extracted from each group by pairs of independent reviewers. Kappa interrater agreement was calculated for each pair regarding CoE. As explained, we recorded CoE according to GRADE criteria (very low, low, moderate, high).1,12
We also extracted summary meta-analytic estimates for the primary outcome from each pair of reviews, i.e. point estimates, dispersion (e.g. 95 % confidence interval), metric used (e.g., relative risk, odds ratio, hazard ratio, standardized mean differences, etc.), number of trials per meta-analysis, number of participants, type of comparator (active vs placebo/no treatment), type of treatment (pharmaceutical vs non-pharmaceutical), whether the authorship of the original and updated reviews changed (to capture potential differences in judgment of CoE by the review team), and type of studies (randomized controlled trials vs observational studies ) that were meta-analyzed.
We converted all effect estimates into odds ratio (OR). We also converted all effect sizes in the same direction, with OR<1 indicating reduction of undesirable outcomes (i.e., more beneficial treatment). Because GRADE separates recommendations as strong vs weak based on the CoE13, typically endorsing strong vs weak (conditional) recommendations based on moderate/high vs. low/very low, respectively4,14, our key analysis focused on the differences in effect sizes between these subgroups. We conducted McNemar’s test for paired (before vs after) data to reject the null hypothesis of equal probability that CoE remained the same i.e., in very low/low CoE vs moderate/high CoE groups. To test for linear trend in change of CoE over all categories -from very low to high- we employed a symmetry test with marginal homogeneity tests (which reduces to McNemar’s test for two non-independent categories of observations).
To asses for differences in the magnitude of effect size between original and updated evidence as a function of change in the assessment of CoE we calculated the ratio of odds ratio (ROR) across meta-analytic estimates. 15 ROR compares intervention effects in meta-analysis of trials with very low/low vs those with moderate/high CoE (or vice versa). 15 Thus, if the comparison referred to OR with very low/low vs those with moderate/high CoE pertains to ROR<1, this would mean that treatment effects were more beneficial in meta-analysis of trials with very low/low CoE, while ROR>1 would indicate the opposite. 15,16A test of interactions was performed to assess the hypothesis of no difference between the subgroups (i.e, treatments effects in very low/low vs moderate/high CoE).17 Because of assumed correlations in comparison of treatment effects, we calculated standard errors for ROR by correlating the effect sizes observed in the original vs updated reviews. 17 We obtained the values for correlation coefficients from the data. We performed sensitivity analyses by: a) assuming one correlation coefficient between effects sizes in the original vs updated reviews, and b) calculating correlation coefficients for each subgroup according to direction of treatment effects (i.e., we calculated separate correlation coefficients for the subgroup showing positive, negative and no change in direction of effects between the original vs updated review- three correlation coefficients in total). We also repeated all analyses assuming no correlations between the effect sizes. Since we observed no differences in the results regardless of the postulated assumption, we report the default analysis based on calculation with three different correlation coefficients.
Our hypothesis was that ROR between the subgroups would differ; in addition, we would expect that the effect size would be larger if CoE change from moderate/high to very low/low than other way around.
The analyses were based on using random effect Sidik-Jonkman model. We assessed heterogeneity i.e. dispersion of effect size across the meta-analytic estimates by calculating τ (tau) statistic.16 We used I 2 statistic to assess inconsistency; I 2 represents the estimated proportion of the observed variance in true effect sizes across individual meta-analyses rather than sampling error;16 it depends both on heterogeneity and total variation in the estimates between the analyses.1618
We complemented assessment of heterogeneity with calculation of the absolute deviation of treatment effects (aROR) as a function of change in CoE. 19 By definition, aROR is positive and reflects the x-fold deviation of treatment effect from OR=1 on the OR scale. Thus, if ROR=0.8 or ROR=1.25, the absolute deviation is equal to aROR=1.25. aROR across all SR/MAs was expressed as (unweighted) median and interquartile range (IQR). 19 We also evaluated how the precision of the estimates changed by calculating the ratio of standard errors for each subgroup summarized as (unweighted) median and IQR.19 Values > 1 indicate larger standard errors (less precision) associated with given category (e.g., very low/low vs moderate/high) of CoE .19
A number of subgroup analyses- all defined a priori and published in the protocol to provide further methodological details20 - were performed. These include assessment of differences between patient-oriented (e.g., mortality, quality of life etc) vs disease-oriented outcomes (e.g. disease response, laboratory outcomes etc.), effect of a change in authorship between the original and updated reviews, effect of comparator intervention (active treatment vs placebo/no treatment control) and type of treatment category (pharmaceutical vs non-pharmaceutical). Finally, in some cases, the SRs included observational studies along with randomized controlled trials (RCTs) and implausibly large ORs generated in conversion processes from standardized mean differences. We further analyzed these results by performing sensitivity analyses excluding SRs with observational studies and large ORs from the analysis.
This paper is reported per PRISMA guidelines.21 All analyses were conducted with the Stata,ver17 statistical package.22