Fig 2  Architecture of the accelerator.
For the traditional channel shuffle operation, channels are selected alternately from two groups of feature maps and recombined into a new output feature map, which is then transferred to BRAM as the input for the next convolution. In this paper, the channel shuffle method is modified by partitioning the output feature maps internally into groups of 4 channels. Channels are then selected alternately from the two groups and recombined into a new output feature map, as illustrated in Fig 3. This approach maintained the advantage of increasing inter-channel information exchange while reducing the number of memory read/write operations by 75%, significantly reducing memory access time.