本文主要是介绍AI算力基础_Why-systolic-architecture,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
阅读总结
作者:H.T.Kung 1982.
年份:1982.
标题:《Why Systolic Architecture?》
关键词:
cost-effectiveness:成本高效益
concurrency:并发性
decompose:分解
massive parallelism:大规模并行
Why Systolic Architecuture? H.T.Kung 1982
Systolic architectures, which permit multiple computations for each memory access, can speed execution of compute-bound problems without increasing I/O requirements.
Systolic architectures, which permit multiple computations for each memory access, can speed execution of compute-bound problems without increasing I/O requirements.
Systolic 结构,在不增加 IO 需求前提下,加速 compute-bound 问题的解决.
Key architectual issues in designing special-purpose systems
①Simple and regular design:可以降低设计成本,通过模块化实现成本与性能成比例;
②Concurrency and communication:由于器件速度的限制,可通过大量并行和降低路由成本加快运算速度;
③Balancing computation with I/O:I/O制约了最大运算速率,所以需要分解运算以减少I/O,平衡I/O需求、系统规模、存储大小之前的关系,探寻I/O带宽对速度的影响
Systolic architectures: the basic principle
脉动阵列的基本原理
基本定义
A systolic system consists of a set of interconnected cells, each capable of performing some simple operation.
Cells in a systolic system are typically interconnected to form a systolic array or a systolic tree. Information in a systolic system flows between cells in a pipelined fashion, and communication with the outside world occurs only at the “boundary cells.” For example, in a systolic array, only those cells on the array boundaries may be I/O ports for the system.
计算任务分类
Computational tasks can be conceptually classified into two families-compute-bound computations and I/O-bound computations
如图,将传统的单个处理单元替换为PE阵列,数据从MEMORY中流出,并沿着PE阵列流过每个PE,实现重复使用。
A family of systolic designs for the convolution computation
以FFT为例,
DesignB1: Wi 是保持不动,Yi周期性移动,Xi广播到每个W的值
DesignB2:Yi 是保持不动,Xi是广播,Wi是周期性移动
And their results are fanned-in and summed using an adder to form a new yi.
这篇关于AI算力基础_Why-systolic-architecture的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!