Recently demands for computation intensive applications such as convolutional neural networks (CNNs) have been increasing. In these applications, valid data for computation are allocated in non-continuous addresses. Therefore, common burst memory access pattern results in a low spatial locality of valid data per access. As a result, computation of data parallel execution units degrades in throughput, as computation resource is wasted by computing invalid data. This is especially a problem in embedded systems in which constraints in power consumption provoke a requirement for high computation efficiency. In this paper, we introduce a Data Rearrange Unit (DRU), a hardware unit rearranging computation data to increase spatial locality of valid data. The DRU drastically reduces the main memory access rate and increases computation efficiency by decreasing memory access to reduce power consumption. We demonstrate the effectiveness of our DRU by implementation on the RMTP SoC   improving convolution throughput on a data parallel execution unit by a maximum of 94times, while only increasing the total cell area by about 13%.