Results:

Study selection:

PRISMA flow diagram in figure (1) shows that 826 studies were identified after removal of duplications, then 789 irrelevant studies were excluded after titles and abstracts screening. Finally, 37 full articles were assessed for eligibility based on the prespecified inclusion criteria. The remaining 6 studies were included in the systematic review and meta-analysis. Some studies were excluded because they had case-control designs (Klopfenstein et al., 2020; Rojas-Marte et al., 2020; Rossotti et al., 2020), while the others were preprint (Rossi et al., 2020) that may be included in the next updates.

Study characteristics:

The population of the included studies had a severe COVID-19 infection. The disease severity criteria are well defined in each study with minor variations. A prespecified severity criteria were found in four studies which include impaired oxygenation in common (Campochiaro et al., 2020; Colaneri et al., 2020; Guaraldi et al., 2020; Kewan et al., 2020) in addition to typical radiological findings (Campochiaro et al., 2020; Guaraldi et al., 2020; Kewan et al., 2020), while the other two studies focused on critical care patients (Ip et al., 2020) and those who required mechanical ventilation (Somers, Eschenauer, Troost, Golob, Gandhi, Wang, Zhou, Petty, Baang, Dillman, et al., 2020). Evidence of hyperinflammation was required for tocilizumab indication in three studies (Campochiaro et al., 2020; Ip et al., 2020; Kewan et al., 2020)
TCZ regimen was one or two doses, each was 400-800 mg.
Routine treatment was given to all patients according to needs and varied widely among the studies. It included: supportive care, symptomatic treatment, steroids, antibiotics, hydroxychloroquine, anticoagulants and antivirals (Table 1 ).
Median onset of COVID-19 symptoms before starting the treatment in tocilizumab and control group respectively was 11 and 9 days in Campochiaro et al ., study (Campochiaro et al., 2020), 4 and 5 days in Guaraldi et al ., study (Guaraldi et al., 2020), and 5 days in Andrew et al ., study (Ip et al., 2020).

Risk of bias within studies:

The risk of bias is summarized in Table (2). As an observational study, cohort design is highly vulnerable to confounding, therefore it is necessary to show comparable groups in baseline factors that can predict the outcomes. Two studies did not give enough data about the significance of these differences (Colaneri et al., 2020; Kewan et al., 2020), while another two showed serious bias risk (Guaraldi et al., 2020; Ip et al., 2020). Selection bias and misclassification bias were low in all studies. Bias due to deviations from intended interventions was moderate in three studies because of unequal background treatment including steroids between the groups (Guaraldi et al., 2020; Ip et al., 2020; Kewan et al., 2020) . Bias in measurement of outcomes was moderate in all studies because of the retrospective design-related recall bias risk. Bias in selection of the reported result was moderate in two studies (Kewan et al., 2020; Somers, Eschenauer, Troost, Golob, Gandhi, Wang, Zhou, Petty, Baang, & Dillman, 2020) as they did not report data about many clinical outcomes and adverse effects associated with tocilizumab use.

Synthesis of results:

Included studies recruited 1473 patients with confirmed severe COVID-19. Of them, 472 patients received tocilizumab while 1001 patients served as a control. Male participants were predominant in all the studies. The male sex OR in TCZ treated group relative to control pooled from the studies using fixed effect model was 1.59 [1.24, 2.04].

The main outcomes:

  1. As demonstrated in Fig. 2, the pooled mortality risk from the included six studies was significantly reduced by 41 % in 7 days, 40 % in 14 days, 28 % in 21 days, and 37% in 28 days in the tocilizumab treated group, compared to the control group using the fixed-effect model. Heterogeneity was not significant (P > 0.05).
  2. In Fig. 3 , the clinical improvement was not statistically significant between the two groups. RR pooled from two studies (Campochiaro et al., 2020; Kewan et al., 2020) using fixed effect model was 1.21 [0.89, 1.64]. Heterogeneity was not significant (P = 0.72).
  3. There was no statistically significant difference between the two groups in the change in respiratory support. RR of improvement and worsening in respiratory support pooled from two studies (Campochiaro et al., 2020; Kewan et al., 2020) using fixed effect model was 1.20 [0.91, 1.57] and 0.52 [0.26, 1.04], respectively. Heterogeneity was not significant (P > 0.05).
  4. Additional outcomes:

  1. In Fig. 4, the occurrence of serious adverse events was not significant between the two groups. RD pooled from three studies (Campochiaro et al., 2020; Colaneri et al., 2020; Guaraldi et al., 2020) using fixed effect model was 0.00 [-0.02, 0.03]. Heterogeneity was not significant (χ2 = 0.32,P = 0.85).
  2. The occurrence of bacteremia was not significant between the two groups. RR pooled from three studies (Campochiaro et al., 2020; Guaraldi et al., 2020; Ip et al., 2020) using fixed effect model was 1.25 [0.80, 1.97]. Heterogeneity was not significant (χ2 = 0.16, P = 0.92)
  3. The elevation of liver functions was not significant between the two groups. RD pooled from three studies (Campochiaro et al., 2020; Colaneri et al., 2020; Guaraldi et al., 2020) using fixed effect model was -0.00 [-0.03, 0.02]. Heterogeneity was not significant (χ2 = 0.24, P = 0.89).
  4. Neutropenia was 9.23 more likely to be occurred in tocilizumab treated group than control group. RR pooled from two studies (Campochiaro et al., 2020; Guaraldi et al., 2020)using fixed effect model was 9.23 [1.06, 80.24]. Heterogeneity was not significant (χ2 = 0.08, P = 0.77)
  5. The occurrence of new infections was significant between the two groups. RD pooled from six studies (Campochiaro et al., 2020; Colaneri et al., 2020; Guaraldi et al., 2020; Ip et al., 2020; Kewan et al., 2020; Somers, Eschenauer, Troost, Golob, Gandhi, Wang, Zhou, Petty, Baang, & Dillman, 2020) using fixed and random effect model was 0.09 [0.05, 0.12] and 0.07 [0.00, 0.14] respectively. Heterogeneity was significant (χ2 = 18.73, P = 0.002).
  6. The occurrence of infusion related reaction was not significant between the two groups. RD pooled from two studies (Campochiaro et al., 2020; Guaraldi et al., 2020) using Fixed effect model was 0.01 [-0.02, 0.03]. Heterogeneity was not significant (χ2 = 0.12, P = 0.73).
  7. Sensitivity analysis:

Based on our assessment of risk of bias, the only two studies showed serious risk of confounding bias, Guaraldi G. et al., and Ip A.et al., were excluded. The same analysis of the combined outcomes was done, and there was no significant difference between the two analyses.