TY - GEN
T1 - A domain specific language and toolchain for OpenCV Runtime Binary Acceleration using GPU
AU - Miyajima, Takaaki
AU - Thomas, David
AU - Amano, Hideharu
PY - 2012/12/1
Y1 - 2012/12/1
N2 - Computationally intensive applications, such as OpenCV, can be off-loaded to accelerators to reduce execution time. However, developing an accelerated system requires a significant amount of time, requiring the developer to first choose an accelerator and which parts to off-load, then to port and the offloaded kernels to the accelerator using many accelerator-specific tools. In addition to the low-level parallelism of the accelerator, the developer also needs to extract and utilize systemlevel parallelism found within the application, while making sure that the application still executes correctly. This paper presents Courier, a toolchain and a domain specific language for Runtime Binary Acceleration, designed to simplify many of the steps involved in accelerating an application. The Courier toolchain can extract dataflow from a running software binary file, explore the off-loaded execution time on an accelerator, and then actually accelerate the original binary. By utilizing Courier, both expert and non-expert users can easily extract systemlevel parallelism and decide which part should be off-loaded to accelerators in a mixed software-hardware environment, without special knowledge on the target application source code and accelerator architecture. In a case study an OpenCV application is accelerated by 2.06 times using Courier, without requiring the application source code or any re-compilation of the application.
AB - Computationally intensive applications, such as OpenCV, can be off-loaded to accelerators to reduce execution time. However, developing an accelerated system requires a significant amount of time, requiring the developer to first choose an accelerator and which parts to off-load, then to port and the offloaded kernels to the accelerator using many accelerator-specific tools. In addition to the low-level parallelism of the accelerator, the developer also needs to extract and utilize systemlevel parallelism found within the application, while making sure that the application still executes correctly. This paper presents Courier, a toolchain and a domain specific language for Runtime Binary Acceleration, designed to simplify many of the steps involved in accelerating an application. The Courier toolchain can extract dataflow from a running software binary file, explore the off-loaded execution time on an accelerator, and then actually accelerate the original binary. By utilizing Courier, both expert and non-expert users can easily extract systemlevel parallelism and decide which part should be off-loaded to accelerators in a mixed software-hardware environment, without special knowledge on the target application source code and accelerator architecture. In a case study an OpenCV application is accelerated by 2.06 times using Courier, without requiring the application source code or any re-compilation of the application.
UR - http://www.scopus.com/inward/record.url?scp=84874272435&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874272435&partnerID=8YFLogxK
U2 - 10.1109/ICNC.2012.34
DO - 10.1109/ICNC.2012.34
M3 - Conference contribution
AN - SCOPUS:84874272435
SN - 9780769548937
T3 - Proceedings of the 2012 3rd International Conference on Networking and Computing, ICNC 2012
SP - 175
EP - 181
BT - Proceedings of the 2012 3rd International Conference on Networking and Computing, ICNC 2012
T2 - 2012 3rd International Conference on Networking and Computing, ICNC 2012
Y2 - 5 December 2012 through 7 December 2012
ER -