Handling of missing data
Missing items in the data set was imputed using multiple imputation with
random forest.
Random forests is an ensemble learning method, primarily used for
classification and regression, which operate by constructing a multitude
of decision trees at training time and outputting the mode of the
classes (classification) or mean prediction (regression) of the
individual trees 10. When applied to data imputation,
random forests leverage their inherent ability to handle non-linear
relationships and interactions between variables to predict missing
values with high accuracy 10,11. The imputation
process was implemented using miceRanger package in R12. Details regarding the imputation process can be
found in the supplementary material .