本文へジャンプ

成果報告書詳細
管理番号20190000000410
タイトル*平成30年度中間年報 高効率・高速処理を可能とするAIチップ・次世代コンピューティングの技術開発/革新的AIエッジコンピューティング技術の開発/FPGA IPと可変精度演算コアの融合による超低消費電力エッジヘビーコンピューティング向けSoCの研究開発
公開日2019/6/22
報告書年度2018 - 2018
委託先名株式会社Preferred Networks 国立大学法人神戸大学
プロジェクト番号P16007
部署名IoT推進部
和文要約
英文要約Title: Project for Innovative AI Chips and Next-Generation Computing Technology Development/Development of Innovative AI Edge Computing Technologies/Research and development of SoC for ultra low power edge heavy computing with combination of an FPGA IP and variable precision arithmetic operation cores (FY2018-FY2020) FY2018 Annual Report

(1) Research and development of reconfigurable, low latency and low power AI computing architecture
To explore power efficient circuit design, we considered 8 bit fixed point number format (int8) and logarithmic number systems (LNS). We concluded that int8 has less advantage than bfloat16 because the width of the multiplication unit are the same among the formats and the dynamic range of int8 is not enough for training of deep neural networks. In LNS, multiplication operation is cheap because it is implemented by a summation circuit. Whereas summation operation becomes expensive instead of that, we found that the cost can be acceptable because required precision of number formats in deep learning is relatively low. We also found that we need to investigate how to support variable precision operations in LNS.

(2) Research and development of reconfigurable, low latency and low power AI processor chip
We selected an ASIC vendor after collecting multiple quotes on ASIC manufacturing cost from four vendors. We temporarily selected 12nm process rule and HBM2 memory after careful consideration. We also evaluated vendor tools of eFPGA, which is FPGA embedded in ASIC, and PCIe IP. On logic design of the arithmetic operation core, we considered designs to reduce circuit size for high processing speed and low power consumption.

(3) Development of a software framework
We investigated existing deep learning frameworks and related software especially focusing on designs for performance optimization. We concluded that there should be two execution styles of running hand-optimized code and running code automatically optimized by a compiler and at first we have to develop hand-optimized routines and their interface from Chainer. We also concluded that we have to develop software commonly required in the two styles, such as an assembler, a runtime and a device driver.

(4) Development of software for robotics application
We focused on object detection methods, particularly neural network models of Path Aggregation Network and CornerNet, as examined application for robotics. We evaluated their performance on PFN's in-house deep learning accelerator chip MN-Core, especially focusing on local memory size, and proposed architecture modification to the execution item 1.
ダウンロード成果報告書データベース(ユーザ登録必須)から、ダウンロードしてください。

▲トップに戻る