How did the road to new adventure in Superconductor Neural Network Accelerator happen?

1: What is Superconductor SFQ?

Moore’s Law, doubling the number of transistors in a chip every two years, has so far contributed to the evolution of computer systems. Unfortunately, we cannot expect sustainable transistor shrinking anymore, marking the beginning of the so-called post-Moore era. Therefore, it has become essential to explore emerging devices, and superconductor single-flux-quantum (SFQ) logic that operates in a 4.2-kelvin environment is a promising candidate. As shown in Figure 1, Josephson junctions (JJs) are used as switching elements in SFQ logic to compose a superconductor ring (SFQ ring) that can store (or trap) and transfer a single magnetic flux quantum. It fundamentally operates with the voltage pulse-driven nature that makes it possible to achieve extremely low-latency (~10⁻¹² s) and low-energy (~10⁻¹⁹ J) JJ switching.

2: History of our SFQ Research

Although several researchers demonstrated over 100 GHz ultra-high-speed SFQ designs successfully, unfortunately, the primary purpose was to prove SFQ circuits’ potential [SFQ BS]. From the viewpoint of computer architecture, we found several issues in such traditional designs: (1) their bit-serial nature to localize the circuits and reduce wire length causes a critical performance issue if we target multi-bit operations that are usually required, (2) the unique characteristic of SFQ logic, i.e., each SFQ logic gate has a latch function inherent in the SFQ ring feature, has not been considered, and (3) there was no discussion regarding effective performance with the consideration of memory impacts. These were the starting points of SFQ research from the viewpoint of computer architecture. At the beginning of 2000, SFQ was not so familiar in our community, so my friends joked, “Hey Koji, you should chill your head before dipping your chip in liquid helium at 4 kelvins!” Based on deep cross-layer discussions, we have decided to go in the direction of “bit-parallel gate-level pipelining (BP-GLP)” to solve issues (1) and (2). There was a claim that it is impossible to maintain the ultra-fast operations if we apply the bit-parallel scheme because the timing constraints become severe due to complex, long wires. However, in 2017, we successfully demonstrated our first BP-GLP ALU design that operates at over 50 GHz with 1.6 mW (see a demo video [SFQ Youtube]). To challenge the issue of (3), we started the current international collaboration of Kyushu University (KU), Seoul National University (SNU), and Nagoya University (NU). We targeted neural network acceleration because its stream-style computing is suitable for BP-GLP. Then, finally, we have proposed SuperNPU [MICRO][TopPicks]. This collaboration worked very well with two key Ph.D. students, Koki Ishida from KU and Ilkwon Byun from SNU, who drove this project. For the kick-off, NU and KU members visited SNU in 2019, and Koki stayed at SNU for three months to accelerate our collaborative work.

3: SuperNPU was born in cross-layer computer architecture research

SuperNPU, as shown in Figure 2 (a), is our design for an SFQ-based neural processing unit (NPU) [MICRO]. The key was to achieve cross-layer interaction and optimization to define and explore architectural design space efficiently and practically. NU and KU have accumulated experience in the design and prototyping of BP-GLP, as shown in Figure 3. All chips in this figure were fabricated with 1.0 µm process technology, and correct operations were obtained in measurement in a 4-kelvin environment. Based on such actual designs, we have extracted device characteristics. Then, SNU and KU developed a simulation framework presented in Figure 2 (b) and performed architectural exploration and optimization. We believe that our team is the first (and best) to explore the SFQ technology for cross-layer computer architecture research.

4: What have we learned from SuperNPU?

Through the research of SuperNPU, we have learned a lot. First, bridging the device/circuit level consideration and architecture level optimization is essential to explore emerging device computing such as SFQ. In particular, fabricating and measuring real chips are important for building accurate power/performance/area models. Because it is a new device, some things cannot be understood until it is manufactured, e.g., the impact of wiring on large-scale SFQ circuits with over 50 GHz ultra-high-speed operations, effects of process variation, etc. Second, new device features impact many tradeoffs in computer systems, so revisiting microarchitecture is a critical challenge. And last but not least, we should accelerate wild/crazy challenges that, of course, have a lot of risks but are so exciting. We chilled our heads (but not 4 kelvins) and decided to go in this direction because it is exactly promising! The SFQ process/fabrication technology is still immature due to the lack of investments, e.g., the current advanced feature size available for us is 1.0 µm, which is several generations older than CMOS. With significant advances in device technology, we can expect scaling merit, and it would open a new door for extremely high-speed, power-efficient computing. It is a vital role of computer architects to exploit the potential of such emerging devices fully.

[SFQ BS] https://doi.org/10.1587/transele.E97.C.157

[SFQ Youtube] https://www.youtube.com/watch?v=jZP7sXWHyZs

[MICRO] https://ieeexplore.ieee.org/document/9251979

[TopPicks] https://ieeexplore.ieee.org/document/9395193

About the authors:

Koji Inoue:

He is a professor in the department of advanced information technology, and the director of the system LSI center, at Kyushu University, Japan. His research interests include power-aware computing, IoT system designs, supercomputing, and emerging device computing.

Jangwoo Kim:

He is a professor in the department of electrical and computer engineering at Seoul National University, Korea. His research interests include server and datacenter architectures, cryogenic computing, and system modeling methodologies.

Masamitsu Tanaka:

He is an assistant professor in the department of electronics at Nagoya University, Japan. His research interests include subterahertz-clock-frequency LSI design methodologies and classical and quantum computing using superconductor-based cryogenic electronics.

How did the road to new adventure in Superconductor Neural Network Accelerator happen?