TY - GEN
T1 - An 8Tb/s 1pJ/b 0.8mm2/Tb/s QDR inductive-coupling interface between 65nm CMOS GPU and 0.1μm DRAM
AU - Miura, Noriyuki
AU - Kasuga, Kazutaka
AU - Saito, Mitsuko
AU - Kuroda, Tadahiro
PY - 2010
Y1 - 2010
N2 - This paper presents an 8Tb/s 1pJ/b 0.8mm2/Tb/s quad data rate (QDR) inductive-coupling interface between 65nm CMOS GPU and 0.1μ m DRAM. The interface consists of 1024-bit parallel inductive-coupling transceivers operating at 8Gb/s/link. In the DRAM transceiver, data are multiplexed and demultiplexed by using a quadrature clock and XOR operation. This circuit technique for QDR compensates for transistor performance gap between the GPU and the DRAM to achieve 8Gb/s bandwidth. The clock for data retiming is recovered from the received data by using an injection-lock VCO. Clock links and clock distribution circuits are not needed, resulting in small layout area of 0.8mm2/Tb/s. Frontend of the transceiver is implemented using NMOS CML circuits with adaptive bias control. The transceiver's sensitivity to PVT variations is small, enabling all the 1024 parallel transceivers to operate at BER<10-16. It also reduces the design margin required of the transceiver, resulting in power reduction to 1pJ/b. Compared to the latest wired 40nm DRAM interface [1], the bandwidth is 32x higher, while the energy consumption is 1/8 and the layout area is 1/22.
AB - This paper presents an 8Tb/s 1pJ/b 0.8mm2/Tb/s quad data rate (QDR) inductive-coupling interface between 65nm CMOS GPU and 0.1μ m DRAM. The interface consists of 1024-bit parallel inductive-coupling transceivers operating at 8Gb/s/link. In the DRAM transceiver, data are multiplexed and demultiplexed by using a quadrature clock and XOR operation. This circuit technique for QDR compensates for transistor performance gap between the GPU and the DRAM to achieve 8Gb/s bandwidth. The clock for data retiming is recovered from the received data by using an injection-lock VCO. Clock links and clock distribution circuits are not needed, resulting in small layout area of 0.8mm2/Tb/s. Frontend of the transceiver is implemented using NMOS CML circuits with adaptive bias control. The transceiver's sensitivity to PVT variations is small, enabling all the 1024 parallel transceivers to operate at BER<10-16. It also reduces the design margin required of the transceiver, resulting in power reduction to 1pJ/b. Compared to the latest wired 40nm DRAM interface [1], the bandwidth is 32x higher, while the energy consumption is 1/8 and the layout area is 1/22.
UR - http://www.scopus.com/inward/record.url?scp=77952196817&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952196817&partnerID=8YFLogxK
U2 - 10.1109/ISSCC.2010.5433909
DO - 10.1109/ISSCC.2010.5433909
M3 - Conference contribution
AN - SCOPUS:77952196817
SN - 9781424460342
T3 - Digest of Technical Papers - IEEE International Solid-State Circuits Conference
SP - 436
EP - 437
BT - 2010 IEEE International Solid-State Circuits Conference, ISSCC 2010 - Digest of Technical Papers
T2 - 2010 IEEE International Solid-State Circuits Conference, ISSCC 2010
Y2 - 7 February 2010 through 11 February 2010
ER -