Microsemi UG0943 CNN Accelerator for PolarFire FPGA User Guide

: June 12, 2024
: Microsemi

Table of Contents

Microsemi UG0943 CNN Accelerator for PolarFire FPGA
Product Information
Product Usage Instructions
Revision History
Introduction
Hardware Implementation
Inputs and Outputs
References
Read User Manual Online (PDF format)
Download This Manual (PDF format)

Microsemi UG0943 CNN Accelerator for PolarFire FPGA

Microsemi-UG0943-CNN-Accelerator-for-PolarFire-FPGA-
PRODUCT

Product Information

The product is the CNN Accelerator for PolarFire FPGA, developed by Microsemi. It is a hardware implementation that provides accelerated processing for Convolutional Neural Networks (CNNs). The CNN Accelerator IP block diagram shows the internal structure of the accelerator, including components such as Activations Read FIFO, Weights Read FIFO, Matrix Framer, Convolution ReLU, Maxpooling, FC Accumulator, Output Framer, and Output Write.

Product Usage Instructions

Hardware Implementation
To implement the CNN Accelerator IP, follow these steps:

Ensure you have the necessary hardware components to support the CNN Accelerator for PolarFire FPGA.
Connect the necessary inputs and outputs to the accelerator, as described in the Inputs and Outputs section of the user manual.
Configure the memory components by specifying the appropriate configuration parameters. Refer to the Configuration Parameters section for details.
Refer to the Timing Diagrams section to understand the timing requirements of the accelerator.
Monitor the resource utilizations of the accelerator using the Resource Utilizations section.

Supported Layers
The CNN engine of the accelerator supports the following types of layers:

Convolutional Layers
ReLU Layers
Max pooling Layers
Fully Connected (FC) Layers

For detailed information on each layer type and how to configure them, refer to the Design Description section of the user manual.

Revision History

The revision history describes the changes that were implemented in the document. The changes are listed by revision, starting with the most current publication.

Revision 1.0
The first publication of this document.

Introduction

**** The CNN Accelerator IP provides hardware acceleration for inferencing Convolution Neural Networks (CNN) on PolarFire® FPGA. The CNN accelerator performs several DSP operations in a single clock cycle to achieve acceleration. A CNN consist of several types of layers connected in sequence like Convolution, Maxpool, ReLU, Fully connected layer, etc. A convolution layer uses Kernels with coefficients called as weights. The IP executes some of these layers sequentially and some of the layers simultaneously. The output of each layer called activations is stored in DDR and used as input to the next layer. The weights of the CNN are stored in DDR and are read along with the input corresponding to a convolution layer. The scheduler inside the CNN IP manages sequencing of a frame start, and execution of different layers till the final output is computed.

The CNN accelerator IP interfaces to a DDR arbiter that enables multiple reads and writes. The IP uses two read channels, one to read the layer inputs and the other to read the network weights. One write channel is used by the IP to write the activations to DDR. The IP expects the input image to be scaled and as per the network input required to be stored in DDR. The scheduler that sequences different layers is configured by the input pins. Typically, a Processor subsystem or UART can be used to generate the data used for configuring the scheduler. The status output represents the number of layer that the CNN IP is currently running. $Microsemi-UG0943-CNN-Accelerator-for- PolarFire-FPGA-FIG- $1$$

Hardware Implementation

This section describes the implementation of the CNN Accelerator IP.

Design Description
The two DDR read channels Image Read and Weights Read read the image data and the weights data stored in DDR at a clock frequency of the DDR interface. A CDC FIFO converts the data from the DDR interface clock to the CNN system clock. The matrix frame frames the 3×3 matrix from the image data that will be used for convolution. The matrix framer implements the zero padding and convolution stride. The weight framer loads the weights values of filters used for convolution. The output framer arranges the convolution output into activation maps and stores them in LSRAM. A 3×3 matrix framer frames the matrix with zero padding and stride according to the network layer. The maxpool module finds the maximum of the 3×3 matrix and generates the final output. If a network layer does not use maxpool operation, the output can be directly selected from LSRAM through the multiplexer at the output.

$Microsemi-UG0943-CNN-Accelerator-for-PolarFire-FPGA-FIG- $2$$

The scheduler module controls the sequence of execution of each layer. For every layer, the scheduler provides the DDR address to read the image and weights and address to write the final output of the engine. It also configures the matrix framer for zero padding and stride, the selection of final output through mux. The convolution type – 2D convolution, Depth-wise convolution, and Point-wise convolution are configured through the scheduler. The scheduler data is loaded through the inputs of the IP corresponding to the scheduler. Types of layers supported by the CNN engine are as follows:

Convolution – stride1/stride2, Zero padding (5,5,5,5) or No zero padding
- Kernel size – 3×3, 5×5, 7×7, 9×9
3×3 Max pooling – stride1/stride2 after convolution
Leaky relu after 3×3 convolution
Relu and Relu Max
3×3 Depth wise convolution – stride1/stride2 with zero padding
Pointwise convolution
Fully connected
Global average pooling -7×7

Memory Components
The CNN Accelerator IP requires the following components to run a network:

Network Data : This defines the structure of the CNN and the DDR memory map of network weights and activations.
Weights Data : This contains the data of weights, biases, scale factors, etc of all the layers of the
Weights Info : This contains the details of mapping SPI content of network weights to the DDR

The above three components are generated as a single hex file from the SDK tool flow that can be loaded into the SPI flash

Inputs and Outputs

The following table shows the input and output ports of the CNN accelerator IP.

_ Table 1 : _Input and Output Ports of the CNN Accelerator IP ****

Signal Name	Direction	Width	Description
RESETN_SYS_CLK_I	Input	–	Active low synchronous reset signal to design with

The interface of the CNN IP with Video arbiter is shown in FIGURE 3

_ Figure 3 : _CNN Accelerator IP interface with Video arbiter

$Microsemi-UG0943-CNN-Accelerator-for-PolarFire-FPGA-FIG- $3$$

Configuration Parameters
The following table shows the description of the configuration parameters used in the hardware implementation of CNN accelerator. These are generic parameters and can be varied as per the requirement of the application.

_ Table 2 : _Configuration Parameters

Name Description

G_PW : Product width or convolution output bit width
G_DWC : Enable to support Depth convolution operation
G_MXP_EN : Enable to support Maxpool operation
G_GAVG_POOLING_EN: Enable to support Global average pooling operation

Timing Diagrams
The following figures show the timing diagrams of read and write channels.

Figure 4: Timing Diagram of Read Channel Microsemi-UG0943-CNN-
Accelerator-for-PolarFire-FPGA-FIG- 4

Figure 5: Timing Diagram of Write Channel

Microsemi-UG0943-CNN-Accelerator-for-PolarFire-FPGA-FIG-
4

Resource Utilizations
The CNN accelerator IP is implemented on PolarFire FPGA (MPF300T – 1FCG1152E package). The following tables show the resource utilization of CNN Accelerator IP.

Table 3: G_PW = 30, G_DWC = 1, G_MXP_EN = 1, G_GAVG_POOLING_EN = 1

LUT	37840
DFF	34832
MATH	152
LSRAM	116
SRAM	45

Table 4: G_PW = 25, G_DWC = 1, G_MXP_EN = 1, G_GAVG_POOLING_EN = 1

LUT	36059
DFF	34434
MATH	152
LSRAM	114
SRAM	45

Table 5 : G_PW = 30, G_DWC = 0, G_MXP_EN = 1, G_GAVG_POOLING_EN = 1

LUT	30497
DFF	29856
MATH	152
LSRAM	116
uSRAM	45

**Table 6: G_PW = 30, G_DWC = 1, G_MXP_EN = 0, G_GAVG_POOLING_EN = 1**

LUT	34260
DFF	32338
MATH	152
LSRAM	95
uSRAM	45

Table 7 : G_PW = 30, G_DWC = 1, G_MXP_EN = 1, G_GAVG_POOLING_EN = 0

LUT	36438
DFF	34262
MATH	152
LSRAM	116
uSRAM	0

Table 8: Performance and Resource Utilization of the IP for Example Networks

| Tiny YOLO v2 COCO| Mobilenet v1| | Resnet50
---|---|---|---|---
Frames/sec @200 MHz| 15.5 FPS| 54 FPS| | 7 FPS
LUT| 28642| 32330| 36059|
DFF| 29128| 31791| 34434|
MATH| 152| 152| 152|
LSRAM| 114| 93| 114|
uSRAM| 0| 45| 45|

Note: The variation in resource utilization is achieved by choosing the optimal settings of the CNN IP for a particular network. Network latency is 1/FPS; networks are run with a batch size of 1.

Microsemi makes no warranty, representation, or guarantee regarding the information contained herein or the suitability of its products and services for any particular purpose, nor does Microsemi assume any liability whatsoever arising out of the application or use of any product or circuit. The products sold hereunder and any other products sold by Microsemi have been subject to limited testing and should not be used in conjunction with mission-critical equipment or applications. Any performance specifications are believed to be reliable but are not verified, and Buyer must conduct and complete all performance and other testing of the products, alone and together with, or installed in, any end-products. Buyer shall not rely on any data and performance specifications or parameters provided by Microsemi. It is the Buyer’s responsibility to independently determine suitability of any products and to test and verify the same. The information provided by Microsemi hereunder is provided “as is, where is” and with all faults, and the entire risk associated with such information is entirely with the Buyer. Microsemi does not grant, explicitly or implicitly, to any party any patent rights, licenses, or any other IP rights, whether with regard to such information itself or anything described by such information. Information provided in this document is proprietary to Microsemi, and Microsemi reserves the right to make any changes to the information in this document or to any products and services at any time without notice.

About Microsemi
Microsemi, a wholly owned subsidiary of Microchip Technology Inc. (Nasdaq: MCHP), offers a comprehensive portfolio of semiconductor and system solutions for aerospace & defense, communications, data center and industrial markets.

Products include high-performance and radiation-hardened analog mixed-signal integrated circuits, FPGAs, SoCs and ASICs; power management products; timing and synchronization devices and precise time solutions, setting the world’s standard for time; voice processing devices; RF solutions; discrete components; enterprise storage and communication solutions, security technologies and scalable anti-tamper products; Ethernet solutions; Power- over-Ethernet ICs and midspans; as well as custom design capabilities and services. Learn more at www.microsemi.com.

Microsemi Headquarters One Enterprise, Aliso Viejo, CA 92656 USA

Within the USA: +1 800-713-4113
Outside the USA: +1 949-380-6100
Sales: +1 949-380-6136
Fax: +1 949-215-4996
Email: sales.support@microsemi.com www.microsemi.com

©2020 Microsemi, a wholly owned subsidiary of Microchip Technology Inc. All rights reserved. Microsemi and the Microsemi logo are registered trademarks of Microsemi Corporation. All other trademarks and service marks are the property of their respective owners
50200943. 1.0 12/20
Microsemi Proprietary UG0943 Revision 1.0