This is the 3-D plot for average success rate, average episode length vs. a number of time steps. It can be seen as the number of training steps increase, the average success rate increases, and the number of time-steps needed to finish the task decrease. Our method, residual recurrent TD3 with impedance controller, significantly performs better than other methods.