TY - GEN
T1 - GPUVM
T2 - 2014 USENIX Annual Technical Conference, USENIX ATC 2014
AU - Suzuki, Yusuke
AU - Kato, Shinpei
AU - Yamada, Hiroshi
AU - Kono, Kenji
N1 - Publisher Copyright:
© Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC 2014. All rights reserved.
PY - 2014
Y1 - 2014
N2 - Graphics processing units (GPUs) provide orders-of-magnitude speedup for compute-intensive data-parallel applications. However, enterprise and cloud computing domains, where resource isolation of multiple clients is required, have poor access to GPU technology. This is due to lack of operating system (OS) support for virtualizing GPUs in a reliable manner. To make GPUs more mature system citizens, we present an open architecture of GPU virtualization with a particular emphasis on the Xen hypervisor. We provide design and implementation of full- and para-virtualization, including optimization techniques to reduce overhead of GPU virtualization. Our detailed experiments using a relevant commodity GPU show that the optimized performance of GPU para-virtualization is yet two or three times slower than that of pass-through and native approaches, whereas full-virtualization exhibits a different scale of overhead due to increased memory-mapped I/O operations. We also demonstrate that coarse-grained fairness on GPU resources among multiple virtual machines can be achieved by GPU scheduling; finer-grained fairness needs further architectural support by the nature of non-preemptive GPU workload.
AB - Graphics processing units (GPUs) provide orders-of-magnitude speedup for compute-intensive data-parallel applications. However, enterprise and cloud computing domains, where resource isolation of multiple clients is required, have poor access to GPU technology. This is due to lack of operating system (OS) support for virtualizing GPUs in a reliable manner. To make GPUs more mature system citizens, we present an open architecture of GPU virtualization with a particular emphasis on the Xen hypervisor. We provide design and implementation of full- and para-virtualization, including optimization techniques to reduce overhead of GPU virtualization. Our detailed experiments using a relevant commodity GPU show that the optimized performance of GPU para-virtualization is yet two or three times slower than that of pass-through and native approaches, whereas full-virtualization exhibits a different scale of overhead due to increased memory-mapped I/O operations. We also demonstrate that coarse-grained fairness on GPU resources among multiple virtual machines can be achieved by GPU scheduling; finer-grained fairness needs further architectural support by the nature of non-preemptive GPU workload.
UR - http://www.scopus.com/inward/record.url?scp=85077458357&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077458357&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85077458357
T3 - Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC 2014
SP - 109
EP - 120
BT - Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC 2014
PB - USENIX Association
Y2 - 19 June 2014 through 20 June 2014
ER -