MICROCHIP v8.0 CoreFFT Fourier Transform User Guide

June 16, 2024
MICROCHIP

v8.0 CoreFFT Fourier Transform

CoreFFT v8.0

Specifications

  • Transform sizes, points: 32, 64, 128, 256, 512, 1024, 2048,
    4096, 8192, and 16384.

  • In-Place FFT: Forward and inverse FFT

  • Streaming FFT: Forward and inverse FFT

  • Input data bit width: Two’s complement

  • Twiddle factor bit width: Natural output sample order

  • Input/output data format: Conditional block floating point
    scaling

  • Pre-defined scaling schedule or no scaling

  • Optional minimal or buffered memory configurations

  • Embedded RAM-block based twiddle Look-up Table (LUT)

  • Support for refreshing twiddle LUT

  • Handshake signals to facilitate easy interface to the user
    circuitry

  • AXI4 Streaming interface: No

  • Run-time forward/inverse transform configuration: Yes

Product Usage Instructions

In-Place FFT

The In-Place FFT implementation supports the Radix-2
decimation-in-time transform. To use the In-Place FFT, follow these
steps:

  1. Initialize the input sequence X(0), X(1),…, X(N-1).
  2. Configure the transform size and point.
  3. Perform the forward or inverse FFT operation as required.
  4. Retrieve the transformed data from the output sequence.

Streaming FFT

The Streaming FFT implementation supports the Radix-22
decimation-in-frequency transform. To use the Streaming FFT, follow
these steps:

  1. Initialize the input sequence X(0), X(1),…, X(N-1).
  2. Configure the transform size and point.
  3. Perform the forward or inverse FFT operation as required.
  4. Retrieve the transformed data from the output sequence.

FAQ

Q: What transform sizes are supported?

A: The CoreFFT supports transform sizes of 32, 64, 128, 256,
512, 1024, 2048, 4096, 8192, and 16384.

Q: What is the input data format?

A: The input data format is two’s complement.

Q: Does CoreFFT support forward and inverse FFT

operations?

A: Yes, CoreFFT supports both forward and inverse FFT
operations.

CoreFFT v8.0
CoreFFT User Guide
Introduction
The Fast Fourier transform (FFT) core implements the efficient Cooley-Turkey algorithm for computing the discrete Fourier transform. CoreFFT is used in a broad range of applications such as digital communications, audio, measurements, control, and biomedical. CoreFFT provides highly parameterizable, area-efficient, and high performance MACC-based FFT. The core is available as a Register Transfer Level (RTL) code of the transform in Verilog and VHDL languages. Equation 1.N-point forward FFT (N is a power of 2) of a sequence x(0), x(1),…, x(N-1) where, k = 0, 1… N-1
Equation 2.N-point inverse FFT (N is a power of 2) of a sequence X(0), X(1),…, X(N-1) where, n = 0, 1… N-1
Important:While performing an inverse FFT, the core does not apply division by N of EQ 2 (as the division by a power of two is trivial).
The following figure illustrates an FFT based system that consists of a data source, the FFT module, and a data sink, which is the transformed data recipient. Figure 1. FFT-Based System Example

Features
CoreFFT supports the Radix-2 decimation-in-time in-place FFT and Radix-22 decimation-in-frequency streaming FFT transform implementations. The following table lists the key features for each implementation.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 1

CoreFFT v8.0

Table 1. Key Features Support

Feature Transform sizes, points

In-Place

Streaming

32, 64, 128, 256, 512, 1024, 2048, 16, 32, 64, 128, 256, 512, 1024,

4096, 8192, and 16384.

2048, and 4096

Note:The 16384-pt FFT is supported on the RTG4TM, PolarFire®,

and PolarFire SoC parts only.

Forward and inverse FFT

Yes

Input data bit width

8­32

Twiddle factor bit width

8­32

Input/output data format

Two’s complement

Natural output sample order

Yes

Conditional block floating point

Yes

scaling

Pre-defined scaling schedule or no No scaling

Optional minimal or buffered memory Yes configurations

Embedded RAM-block based twiddle Yes Look-up Table (LUT)

Support for refreshing twiddle LUT Yes

Handshake signals to facilitate easy Yes interface to the user circuitry

AXI4 Streaming interface

No

Run-time forward/inverse transform No configuration

Yes 8­32 8­32 Two’s complement Optional No
Yes
No
Yes
No Yes
Yes Yes

Supported Families
CoreFFT supports the following FPGA families. · PolarFire® · PolarFire SoC · SmartFusion® 2 · IGLOO® 2 · RTG4TM
Device Utilization and Performance
CoreFFT has been implemented in the SmartFusion2 M2S050 device using speed grade -1 and PolarFire MPF300 using speed grade -1. A summary of the implementation data is provided in 6. Appendix A: In-Place FFT Device Utilization and Performance and 7. Appendix B: Streaming FFT Device Utilization and Performance.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 2

CoreFFT v8.0
Table of Contents
Introduction…………………………………………………………………………………………………………………………………..1 Features……………………………………………………………………………………………………………………………….. 1 Supported Families………………………………………………………………………………………………………………… 2 Device Utilization and Performance………………………………………………………………………………………….. 2
1. Functional Description……………………………………………………………………………………………………………..4 1.1. Architecture Options………………………………………………………………………………………………………4 1.2. In-Place FFT…………………………………………………………………………………………………………………4 1.3. In-Place Memory Buffers………………………………………………………………………………………………..5 1.4. Streaming FFT…………………………………………………………………………………………………………….. 7
2. Interface……………………………………………………………………………………………………………………………… 12 2.1. In-Place FFT……………………………………………………………………………………………………………….12 2.2. Streaming FFT…………………………………………………………………………………………………………… 14
3. Timing Diagrams………………………………………………………………………………………………………………….. 20 3.1. In- Place FFT……………………………………………………………………………………………………………….20 3.2. Streaming FFT…………………………………………………………………………………………………………… 21
4. Tool Flow…………………………………………………………………………………………………………………………….. 23 4.1. License……………………………………………………………………………………………………………………… 23 4.2. Configuring CoreFFT in SmartDesign……………………………………………………………………………. 23 4.3. Simulation Flows………………………………………………………………………………………………………… 24 4.4. Design Constraints……………………………………………………………………………………………………… 25 4.5. Synthesis in Libero SoC………………………………………………………………………………………………. 25 4.6. Place-and-Route in Libero SoC……………………………………………………………………………………..25
5. System Integration……………………………………………………………………………………………………………….. 26 5.1. In- Place FFT……………………………………………………………………………………………………………….26 5.2. Streaming FFT…………………………………………………………………………………………………………… 26
6. Appendix A: In-Place FFT Device Utilization and Performance……………………………………………………28
7. Appendix B: Streaming FFT Device Utilization and Performance…………………………………………………30
8. Revision History…………………………………………………………………………………………………………………… 32
Microchip FPGA Support………………………………………………………………………………………………………………34
Microchip Information………………………………………………………………………………………………………………….. 34 The Microchip Website…………………………………………………………………………………………………………..34 Product Change Notification Service………………………………………………………………………………………. 34 Customer Support………………………………………………………………………………………………………………… 34 Microchip Devices Code Protection Feature……………………………………………………………………………..34 Legal Notice………………………………………………………………………………………………………………………… 35 Trademarks…………………………………………………………………………………………………………………………. 35 Quality Management System…………………………………………………………………………………………………. 36 Worldwide Sales and Service………………………………………………………………………………………………….37

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 3

CoreFFT v8.0
Functional Description
1. Functional Description
This section describes the functional description of the CoreFFT.
1.1 Architecture Options
Depending on user configuration, CoreFFT generates one of the following transformation implementations: · In-place FFT · Streaming FFT
1.2 In-Place FFT
The architecture option loads a frame of N complex data samples in its in- place RAM and processes them sequentially, using a single Radix-2 processor. It stores the results of each stage in the in-place RAM. The in-place FFT takes fewer chip resources than the streaming FFT, but the transformation time is longer. The following figure shows a functional diagram of the in-ilace transform. Figure 1-1. In-Place Radix-2 FFT Functional Block Diagram (Minimal Configuration)

The input and output data are represented as 2 * WIDTH-bit words comprised of real and imaginary parts. Both parts are two’s complement numbers of WIDTH bits each. The module processes frames (bursts) of data with a frame size of N complex words. The frame to be processed is loaded in the in-place memory. The memory contains two identical RAM blocks, each is capable of storing N/2 complex words. The in-place memory supports double bandwidth. It can read and write two complex words at the same time. Once the N complex data samples are loaded in the memory, the FFT computation starts automatically, and the in- place memory is used for the computations.
The in-place FFT computational process occurs in a sequence of stages with the number of stages equal to log2N. At every stage of the FFT data processing, the Radix-2 butterfly reads all the data stored in the in-place memory, two complex words at a time. The read switch along with a read address generator (not shown in Figure 1-1) helps the butterfly to obtain stored data in the order required by the FFT algorithm. In addition to the data, the butterfly obtains twiddle factors (sine/cosine coefficients) from the twiddle LUT. The butterfly writes intermediate results to the in-place memory through the write switch.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 4

CoreFFT v8.0
Functional Description
After the last computational stage, the in-place memory stores the fully transformed data. The module puts out an N-word transformed data frame, one word at a time, provided the signal READ_OUTP is active. CoreFFT calculates the twiddle factors required by the FFT algorithm and writes them to the twiddle LUT. This happens automatically on power-on when asynchronous global reset NGRST is asserted.

1.3
1.3.1

In-Place Memory Buffers
This section describes the In-Place Memory Buffers of the CoreFFT.
Minimal Configuration The minimal configuration, as shown in Figure 1-1, is sufficient to accomplish the FFT because it has the in-place RAM required by the FFT algorithm. But the minimal configuration does not utilize the processing engine all the time. On the contrary, when data is loaded in the in-place memory, or the transformed data are read out, the butterfly stays idle. Following figure shows the FFT cycle timeline. The cycle consists of the following three phases:
· Download a fresh input data frame in the in-place RAM · Perform the actual transformation · Upload the transformation result to free up the in-ilace RAM
Figure 1-2. Minimal Configuration In-Place FFT Cycle

1.3.2

In the minimal configuration, the butterfly runs only during the computation phase. When the data burst rate permits, the minimal configuration provides the best device resource utilization. In particular, it saves a significant number of RAM blocks.
Buffered Configuration In order to improve the butterfly utilization and consequently reduce the average transformation time, additional memory buffers can be used. Following figure shows the buffered FFT block diagram.
Figure 1-3. Buffered FFT Block Diagram

The buffered option has two identical in-place memory banks implementing a ping-pong buffer and one output buffer. Each bank is capable of storing N complex words and reading two complex words at a time. The core state machine controls the ping-pong switching, so that a data source sees only a buffer that is ready to accept new data. The buffer that does not accepts the new data is used as an in-place RAM by the FFT engine.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 5

CoreFFT v8.0
Functional Description
The ping-pong buffering architecture increases the efficiency of the FFT engine. While one of the two input banks is involved in the current FFT computation, the other is available for downloading the next input data frame. As a result, the FFT engine does not sit idle waiting for fresh data to fill the input buffer. From the data source perspective, the core can receive a data burst anywhere within the FFT computation period. When the engine has finished processing the current data frame and the input buffer bank has been filled with another data frame, the state machine swaps the ping-pong banks, and the data load and computation continues on the alternate memory banks.
The last stage of the FFT computation uses an out-of-place scheme. The FFT engine reads intermediate data from the in-place memory but writes the final result in the output data buffer. The final results remain in the output buffer until the FFT engine replaces them with the results of the next data frame. From the data recipient perspective, the output data are available for reading any time, except for the last FFT stage.
The buffered configuration FFT cycle is shown in the following figure.
Figure 1-4. Buffered Configuration FFT Cycles

1.3.3

Finite Word Length Considerations At every stage of the in-place FFT algorithm, the butterfly takes two samples out of the in-place memory and returns two processed samples to the same memory locations. The butterfly calculation involves complex multiplication, addition, and subtraction. The returning samples may have a larger data width than the samples picked from the memory. Precautions must be taken to ensure that there are no data overflows.
To avoid risk of overflow, the core employs one of the following three methods:
· Input data scaling · Unconditional block floating-point scaling · Conditional block floating-point scaling
Input Data Scaling: The input data scaling requires pre-pending the input data samples with enough extra sign bits, called guard bits. The number of guard bits necessary to compensate for the maximum possible bit growth for an N-point FFT, is log2N + 1. For example, every input sample of a 256-point FFT must contain nine guard bits. Such a technique greatly reduces the effective FFT bit resolution.
Unconditional Block Floating-Point Scaling: The second way to compensate for the FFT bit growth is to scale the data down by a factor of two at every stage. Consequently, the final FFT results are scaled down by a factor of 1/N. This approach is called unconditional block floating-point scaling.
The input data need to be scaled down by a factor of two to prevent overflow at the first stage. To prevent the overflow in successive stages, the core scales down the results of every previous stage by the factor of two by shifting the entire block of data (all results of the current stage) one bit to the right. The total number of bits the data loses because of the bit shifting in the FFT calculation is log2N.
The unconditional block floating-point results in the same number of lost bits as in the input data scaling. However, it produces more precise results, as the FFT engine starts with more precise input data.
Conditional Block Floating-Point Scaling: In the conditional block floating- point scaling, data is shifted only if bit growth actually occurs. If one or more butterfly outputs grow, the entire block of data is shifted to the right. The conditional block floating-point monitor checks every butterfly output for growth. If shifting is necessary, it is

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 6

CoreFFT v8.0
Functional Description
performed after the entire stage is complete, at the input of the next stage butterfly. This technique provides the least amount of distortion (quantization noise) caused by finite word length.
In Conditional Block Floating-Point mode, the core can optionally calculate the actual scaling factor. It does so if the parameter SCALE_EXP_ON is set to be 1. Then the calculated actual factor appears on the SCALE_EXP port. The factor represents the number of right shifts the FFT engine applied to the results. For example, the SCALE_EXP value of 4 (100) means that the FFT results were shifted right (downscaled) by 4 bits; that is, divided by 2SCALE_EXP = 16. The signal accompanies the FFT results and is valid while OUTP_READY is asserted. To scale back the actual CoreFFT results, that is, to make them comparable to floating point transformed bins, every FFT output sample needs to be multiplied by 2SCALE_EXP:
· FFT Result (Real) = DATAO_RE2SCALE_EXP · FFT Result (Imaginary) = DATAO_IM2SCALE_EXP
Important:The scale exponent calculator can be enabled in conditional block floating-point mode only.

1.3.4

The CoreFFT, by default, is configured to apply the conditional block floating-point scaling. In conditional block Floating-Point mode, the input data is checked and downscale by a factor of two if necessary, prior to the first stage.
Transformation Time The FFT computation takes (N/2 + L) x log2N + 2 clock cycles, where L is an implementation specific parameter representing the aggregate latency of a memory bank, switches, and the butterfly. L does not depend on transform size N. It only depends on the FFT bit resolution. L is equal to 10 at bit resolutions of 8 to 18, and L is equal to 16 at bit resolutions of 19 to 32. For example,
· For a 256-point 16-bit FFT
Computation Time = (256/2 + 10) x log2256 + 2 = 1106 clock periods.
· For a 4096-point 24-bit FFT
Computation Time = (4096/2 + 16) x log24096 + 2 = 24770 clock periods.

1.3.5

Memory Implementation The core uses hard RAM blocks to implement the in-place memory, other memory buffers, and a twiddle LUT. The FPGAs carry two hard RAM types: large SRAM (LSRAM) and micro-RAMs. The memory implementation can be controlled by setting the URAM_MAXDEPTH parameter. CoreFFT uses micro-RAMs if the required depth does not exceed the parameter value. For example, the URAM_MAXDEPTH parameter set to 64, utilizes micro-RAMs in any FFT size up to 128 points, as the required depth is POINTS/2. By setting the parameter value to 0 prevents the core from using the micro-RAMs at all, so that they can be used elsewhere.
The parameter URAM_MAXDEPTH is accessible through the core user interface.

1.4 Streaming FFT
Streaming FFT supports continuous complex data processing, one complex input data sample per clock period. The streaming architecture has as many Radix-22 processors, RAM blocks, and LUT’s as necessary to support streaming data transformation. The following figure shows a functional diagram of the 256-point streaming transform.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 7

Figure 1-5. Streaming Radix-22 256-pt FFT Functional Block Diagram

CoreFFT v8.0
Functional Description

The input and output data are represented as (2 x DATA_BITS)-bit words comprised of real and imaginary parts. Both parts are two’s complementary numbers of DATA_BITS bits each. The module processes frames of data with a frame size equal to the transform size of N complex words. The frame to be processed comes to the x(n) input as a sequence of the complex data words, one (2 x DATA_BITS)-bit word per clock interval. The next frame can start immediately after the last data word of a current frame or at any time later on.
The following figure shows an example of frame i+1 immediately following frame i, and the frame i+2 coming after an arbitrary gap. The input data samples within a frame must come at every clock interval, thus a frame lasting exactly N clock intervals. There is a substantial latency associated with the streaming algorithm. The output data frames appear at the same order, clock rate, and with the same gaps (if any) between the output frames, as those between the input frames.
Figure 1-6. Streaming FFT Input Data Frames

1.4.1 1.4.2

The number of FFT butterflies equals log2(N), thus every stage being processed by a separate butterfly. As a result, all stages are processed in parallel.
CoreFFT calculates the twiddle factors required by the FFT algorithm. At power-up, the core automatically uploads the twiddle factors in on-chip RAMs that become the twiddle LUTs. User action is not required to make it happen. Upon completion of the uploading, the core activates the RFS signal, letting a data source know that the core is ready to start FFT processing. The LUT contents can be refreshed at any time by issuing a one clock wide signal, REFRESH.
Streaming FFT Latency The streaming FFT latency is primarily defined by the transform size, N. The implementation adds up a number of pipeline delays that depend on the FFT size and data path bit width. In other words, the FFT results are delayed regarding the input data by not less than N data intervals for the bit-reversed outputs. The ordered output latency is about two times larger.
Streaming FFT Memory Implementation Similarly to the in-place architecture, the streaming FFT uses hard RAM blocks to implement the required memories, LUTs, and delay lines. The memory implementation can be controlled by setting the URAM_MAXDEPTH parameter. CoreFFT uses micro RAMs if the depth of the memory does not exceed the parameter value. For example, the URAM_MAXDEPTH parameter, set to 128, utilizes micro-RAMs to create memories of depth of 128 and less. By setting the parameter value to 0, prevents the core from using the micro RAMs at all, so that they can be used elsewhere.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 8

CoreFFT v8.0
Functional Description

1.4.3

Streaming FFT Output Data Words Order The output results obtained from the Radix-2 and the Radix-22 FFT algorithms are in the bit-reversed order.
However, the in-place implementation internally performs the sample ordering. Therefore, the core puts the results out in a natural order. The Streaming FFT supports both bit-reversed and natural output orders. The bit-reversed option utilizes fewer chip resources and provides smaller latency.

1.4.4 1.4.4.1

Finite Word Length Considerations This section describes the finite word length considerations of the CoreFFT.

Unscaled and Scale Schedule Modes
The butterfly calculation involves addition and subtraction. These operations can cause the butterfly data width to grow from input to output. Every butterfly, BF2I, or BF2II (see Figure 1-5), can introduce an extra bit to the data width. In addition, the multiplications can add one bit to the result. The overall potential bit growth = log2(N)+1 bits. Precautions must be taken to ensure that there are no data overflows.

To avoid or reduce a risk of overflow, the core employs one of two techniques:
· Unscaled mode builds data path wide enough to accommodate the bit growth. The data path width grows from stage to stage to fully accommodate the algorithm bit growth, so that the data overflow never happens. The real or imaginary output bit width is log2(N)+1 bits wider than the input one. The design is entirely safe from the overflow point of view.
· Configurable scale schedule technique provides a user with control over scaling down (truncation of) every intermediate result that can cause overflow. The output bit width equals the input bit width. The technique is overflow-safe only when the scaling schedule matches the actual bit growth, which is not easy to achieve. Cautious approach to the configurable scaling often leads to extra down scaling. But if the nature of the transformed signal is known to be overflow-safe with some or all stages omitting the extensive downscaling, the technique is beneficial both from signal-to-noise ratio and chip resource utilization standpoints. When configured for the scale schedule technique, the core generates an overflow flag if the overflow happened. The Radix-22 butterfly can introduce 3-bit growth: butterflies BF2I, BF2II, and a multiplier each can add a bit. But only one multiplication out of all FFT stages can add the bit. As it is unknown upfront the stage at which the multiplier induces the extra bit if any, the FFT engine in the unscaled mode extends the data path by the bit starting at the first stage.
In the scale schedule technique every Radix-22 stage can introduce 3-bit growth. The data path within the stage grows accordingly, that is, the stage output is three bits wider than the stage input. The engine cuts out the three extra bits after the stage result is calculated, that is, the stage output gets truncated by three bits before it goes to the next stage. Such approach eliminates the need of guessing the sub-stage at which downscaling needs to be applied.
The following table explains the three bits that get cut out in the scale schedule mode depending on the 2-bit schedule value for a particular stage.

Table 1-1. Cutting Out Three Extra Bits in Scale Schedule Mode

Scale Schedule for a Given Radix-22 Stage

Bits the Core Cuts Out

00

Cut out three MSB’s

01

Cut out two MSB’s and round one LSB

10

Cut out one MSB and round two LSB’s

11

Round three LSB’s

The FFT/IFFT of the sizes 32, 128, or 512 that are not a power-of-four, in addition to the Radix-22 butterflies, utilize a single Radix-2 butterfly. The one applies to the last processing stage and cuts out a single extra bit.
The core automatically invokes overflow detection in the scale schedule mode. The overflow flag (OVFLOW_FLAG) appears as soon as the core detects the actual overflow. The flag stays active until the end of an output frame where the overflow is detected.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 9

CoreFFT v8.0
Functional Description

1.4.4.2

Unscaled Mode Input Bit Width Limitations The Unscaled mode limits the maximal input sample bit width handled by the core. The following table lists the maximum bit widths for every FFT size.
Table 1-2. Streaming Unscaled FFT Max Input Data Bit Width

FFT Size 16

Maximum Input Width 32

32

30

64

30

128

28

256

28

512

26

1024

26

2048

24

4096

24

1.4.4.3

Entering Scale Schedule The scale schedule identifies the downscaling factor for every streaming FFT stage. Every Radix-22 stage scaling factor is controlled by dedicated two bits of the scale schedule, and the Radix-2 stage used in the non-power-of-four FFTs is controlled by a single bit. The following figure depicts an example of a scale schedule user interface for 1024-pt FFT. A pair of checkboxes corresponds to a specific Radix-22 stage and presents two bits of the downscaling factor. The actual downscaling factor at a particular stage is calculated as 22*Bit1+Bit0 and takes one of the following values: 1, 2, 4, 8. The checkboxes shown in the following figure correspond to the binary scale schedule value of 10 10 10 10 11. This value presents a conservative scale schedule that does not cause the overflow.
Figure 1-7. Scale Schedule User Interface

The following table lists the conservative scale schedules for every FFT size that is completely overflow safe.

Table 1-3. Conservative Scale Schedules for Various FFT Sizes

FFT Size

Radix-22 Stage

5

4

3

2

1

0

4096

1

0

1

0

1

0

1

0

1

0

1

1

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 10

………..continued FFT Size
2048 1024 512 256 128 64 32 16

CoreFFT v8.0
Functional Description

Radix-22 Stage

5

4

3

2

1

0

x

1

1

0

1

0

1

0

1

0

1

1

x

x

1

0

1

0

1

0

1

0

1

1

x

x

x

1

1

0

1

0

1

0

1

1

x

x

x

x

1

0

1

0

1

0

1

1

x

x

x

x

x

1

1

0

1

0

1

1

x

x

x

x

x

x

1

0

1

0

1

1

x

x

x

x

x

x

x

1

1

0

1

1

x

x

x

x

x

x

x

x

1

0

1

1

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 11

CoreFFT v8.0
Interface

2. Interface
This section describes the interface of the CoreFFT.

2.1
2.1.1

In-Place FFT
This section describes the In-Place FFT of the CoreFFT.

Configuration Parameters CoreFFT has parameters (Verilog) or generics (VHDL) for configuring the RTL code. The following table describes the parameters and generics. All parameters and generics are integer types.
Table 2-1. In-Place CoreFFT Parameter Descriptions

Parameter INVERSE

Valid Range 0­1

Default 0

Description
0: Forward Fourier transform 1: Inverse Fourier transform

SCALE

0­1

0

0: Conditional block floating point scaling

1: Unconditional block floating point scaling

To apply the input data scaling, set the SCALE parameter to 0 and prepend the proper number of guard bits to the input data. Then the conditional block floating point has no effect.

POINTS
WIDTH MEMBUF

32, 64, 128,

256

256, 512, 1024,

2048, 4096,

8192, 16384

8­32

18

0­1

0

Transform size. Note:The 16384-pt FFT is supported on RTG4, PolarFire, and PolarFire SoC parts only.
Data and twiddle factor bit width
0: Minimal (no buffer) configuration 1: Buffered configuration

SCALE_EXP_ON

0­1

0

0: Does not build the conditional block floating-point

exponent calculator

1: Builds the calculator

URAM_MAXDEPTH

0, 4, 8, 16, 32, 64, 128, 256, 512

The largest RAM depth to be implemented with the microRAM available on the SmartFusion2, IGLOO2, RTG4, PolarFire, and PolarFire SoC parts. When the RAM depth required for a user-selected transform size POINTS exceeds the URAM_MAXDEPTH, large LSRAM blocks are used.

2.1.2

Ports The following table lists the port signals for the in-place CoreFFT architecture.
Table 2-2. In-Place CoreFFT Port Descriptions

Port Name DATAI_IM

In/Out Port Width Bits Description

In

WIDTH

Imaginary input data to be transformed

DATAI_RE

In

WIDTH

Real input data to be transformed

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 12

CoreFFT v8.0
Interface

………..continued

Port Name

In/Out

DATAI_VALID In

Port Width Bits 1

Description
Input complex word valid The signal accompanies valid input complex words present on inputs DATAI_IM, DATAI_RE. When the signal is active, the input complex word is loaded into the core memory provided the BUF_READY signal has been asserted.

READ_OUTP In

1

Read transformed data Normally the module puts out FFT results, once they are ready, in a single burst of N complex words. The transformed data recipient can insert arbitrary breaks in the burst by deasserting the READ_OUTP signal.

DATAO_IM

Out

DATAO_RE

Out

DATAO_VALID Out

WIDTH WIDTH 1

Imaginary output data
Real output data
Output complex word valid The signal accompanies valid output complex words present on DATAO_IM and DATAO_RE outputs.

BUF_READY Out

1

FFT accepts fresh data The core asserts the signal when it is ready to accept data. The signal stays active until the core memory is full. In other words, the signal stays active until POINTS complex input samples are loaded.

OUTP_READY Out

1

FFT results ready The core asserts the signal when the FFT results are ready for the transformed data recipient to read. The signal stays active while the transformed data frame is being read. Normally it lasts for POINTS clock intervals unless the READ_OUTP signal is deasserted.

SCALE_EXP

Out

floor[log2 ( Ceil(log2(POIN TS)))]+1

Conditional block floating-point scaling exponent This optional output can be enabled by setting the SCALE_EXP_ON parameter. The output can be enabled when the core is in conditional block floating-point scaling mode only (the parameter SCALE = 0).

PONG CLK

Out

1

In

1

Pong bank of the input memory buffer is being used by the FFT engine as a working in-place memory. This optional signal is valid only in the buffered configuration.
Clock Rising edge active The core master clock

SLOWCLK

In

1

NGRST

In

1

Low frequency Rising-edge clock signal for twiddle LUT initialization, it should be at least divide by eight times of CLK frequency.
Asynchronous reset Active-Low

Important:All signals are active-high (logic 1) unless otherwise specified.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 13

CoreFFT v8.0
Interface

2.2
2.2.1

Streaming FFT
Streaming FFT is available with GUI configurable native interface or AXI4 streaming interface.

Configuration Parameters CoreFFT has parameters (Verilog) or generics (VHDL) for configuring the RTL code. The following table describes these parameters and generics. All parameters and generics are integer types.
Table 2-3. CoreFFT Streaming Architecture Parameter Descriptions

Parameter Name FFT_SIZE

Valid Range Default
16, 32, 64, 128, 256 256, 512, 1024, 2048, and 4096

Description
Transform size points The core processes frames of complex data with every frame containing FFT_SIZE complex samples. The transformed data frames are of the same size.

NATIV_AXI4

0 – 1

0

Interface selection of the IP

· 0 – Native interface

· 1 – AXI4 streaming interface

It is available only for streaming architecture

SCALE_ON

0 – 1

1

1 – Enable configurable scale schedule

When the option is enabled, the core applies the configurable

scale factor, SCALE_SCH after every butterfly.

0 – Unscaled mode

SCALE_SCH

0

Scale schedule

If the SCALE_ON parameter equals 1, SCALE_SCH is used to

define the scaling factor for every processing stage.

DATA_BITS TWID_BITS ORDER

8 – 32 8 – 32 0 – 1

18

Input data bit width of real or imaginary parts.

18

Twiddle factor bit width of its real or imaginary parts.

0

0: Output data in bit-reversed order

1: Output data in normal order

URAM_MAXDEPTH 0, 4, 8, 16, 32, 0 64, 128, 256, 512

The largest RAM depth to be implemented with micro-RAM available on the SmartFusion2, IGLOO2, RTG4, PolarFire, or PolarFire SoC parts. When the RAM depth required for a user-selected transform size POINTS exceeds the URAM_MAXDEPTH, large LSRAM blocks are used.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 14

CoreFFT v8.0
Interface

………..continued
Parameter Name
AXI4S_IN_DATA Note:Explains the 0’s padding for real and imaginary input data samples when NATIV_AXI4 = 1

Valid Range 8,16,24,32

Default 24

Description
It is internally generated parameter, not accessible to user. It is used to interpret the input data samples in terms of byte boundaries to facilitate AXI4 streaming interface. AXI4S_IN_DATA size defined as follows:
1. If DATA_BITS = 8 then AXI4S_IN_DATA= 8, no padding is required for input data samples
2. If 8 < DATA_BITS < 16 then AXI4S_IN_DATA = 16, the input data sample must be padding with 16 (DATA_BITS) of 0’s at MSB position, both for real and imaginary data samples before sending
3. If 16 < DATA_BITS < 24 then AXI4S_IN_DATA = 24, the input data sample must be padding with 24 (DATA_BITS) of 0’s at MSB position, both for real and imaginary data samples before sending
4. If 24 < DATA_BITS < 32 then AXI4S_IN_DATA = 32, the input data sample must be padding with 32 (DATA_BITS) of 0’s at MSB position, both for real and imaginary data samples before sending
Note:Padding should be starting from MSB.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 15

CoreFFT v8.0
Interface

………..continued Parameter Name

Valid Range

AXI4S_OUT_DATA 8,16,24,32, 40 Note:Explains the 0’s padding for real and imaginary output data samples when NATIV_AXI4 = 1

Default 24

Description
It is internally generated parameter, not accessible to user. It is used to interpret the output data samples in terms of byte boundaries to facilitate AXI4 streaming interface. AXI4S_OUT_DATA size defined as follows:
When SCALE_ON = 0, then output sample size is STREAM_DATAO_BITS = DATA_BITS+ceil_log2 (FFT_SIZE) + 1
When SCALE_ON = 1, then output sample size is STREAM_DATAO_BITS = DATA_BITS
1. If STREAM_DATAO_BITS = 8 then AXI4S_OUT_DATA = 8, no padding is added for output data samples
2. If 8 < STREAM_DATAO_BITS < 16 thenAXI4S_OUT_DATA= 16, the output data samples are padded with 16 – (STREAM_DATAO_BITS) of 0’s at MSB position, both for real and imaginary data samples before framing
3. If 16 < STREAM_DATAO_BITS < 24 thenAXI4S_OUT_DATA = 24, the output data samples are padded with 24 – (STREAM_DATAO_BITS) of 0’s at MSB position, both for real and imaginary data samples before framing
4. If 24 < STREAM_DATAO_BITS < 32 thenAXI4S_OUT_DATA = 32, the output data samples are padded with 32-(STREAM_DATAO_BITS) of 0’s at MSB position, both for real and imaginary data samples before framing
5. If 32 < STREAM_DATAO_BITS < 40 thenAXI4S_OUT_DATA = 40, the output data samples are padded with 40 – ( STREAM_DATAO_BITS) of 0’s at MSB position, both for real and imaginary data samples before framing
Note:Padding should be starting from MSB.

2.2.2

Ports The following table describes the port signals for the Streaming CoreFFT macro.
Table 2-4. Streaming FFT I/O Signal Descriptions

Port Name CLK SLOWCLK
CLKEN

In/Out In In
In

Port Width, bits Description

1

Rising-edge clock signal

1

Low frequency Rising-edge clock signal for twiddle LUT

initialization, it should be at least divide by four times of CLK

frequency.

1

Optional clock enable signal

After de-asserting the signal, the core stops generating valid

results

NGRST

In

1

RST

In

1

Ports available when NATIV_AXI4 = 1

Asynchronous reset signal active-low. Optional synchronous reset signal active-high.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 16

CoreFFT v8.0
Interface

………..continued

Port Name

In/Out

AXI4_SDATAI In TVALID

AXI4_SDATAI Out TREADY
AXI4_S_TDATAI In

AXI4_S_TLASTI In
AXI4_M_DATAO Out _TVALID

AXI4_M_DATAO In _TREADY

AXI4_M_TDATA Out O

AXI4_M_TLAST Out O
AXI4_S_CONFIG In I_TVALID

AXI4S

Out

CONFIGI

_TREADY

AXI4_S_CONFIG In I

AXI4_M_CONFI Out GO_TVALID
AXI4_M_CONFI In GO _TREADY

Port Width, bits Description

1

AXI4 Stream data valid input to the core from external source

indicates the data availability. It acts as START of the core.

Note:Read START port description for more information.

1

AXI4 Stream data ready to the external source

Indicates cores readiness of accepting the data

(2 *

AXI4 Stream data input from source to the core.

AXI4S_IN_DATA) Contains real data (DATAI_RE) padded with 0’s and imaginary

(DATAI_IM) data padded with 0’s accordingly.

1

Indicates the transmission of last data sample from external

source.

1

AXI4 Stream data valid output to receiver indicates core is ready

to send transformed data. It acts as DATAO_VALID of the core.

Note:Read DATAO_VALID port description for more

information.

1

AXI4 Stream data ready from receiver

Indicate the external receiver readiness

It must be always 1 for core functionality

(2 * AXI4S_OUT_DA TA)

AXI4 Stream data out to receiver.
Contains transformed real data (DATAO_RE) padded with 0’s and imaginary data (DATAO_IM) padded with 0’s accordingly.

1

Indicates the transmission of last transformed data sample from

IP

1

Valid input to the core from external source

Indicates the configuration data availability

1

Ready to the external source to indicate cores readiness of

accepting the configuration data.

8

Configuration data input from source to the core and the source

should configure the IP before transmitting the data samples. It

contains following configuration information:

· Bit0 – INVERSE (When the bit is high, the core computes Inverse FFT of the following data frame, otherwise Forward FFT)

· Bit1 – REFRESH (Reload the twiddle coefficient LUTs in the corresponding RAM blocks)

1

Status data valid output to receiver

Indicate core is ready to send transformed data

1

Status data ready from receiver

Indicates the external receiver readiness.

It must be always 1 for core functionality.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 17

CoreFFT v8.0
Interface

………..continued

Port Name

In/Out

AXI4_M_CONFI Out GO

Port Width, bits Description

8

Status data out to receiver

It contains following status information:

Bit0 – OVFLOW_FLAG (Arithmetic overflow flag, CoreFFT asserts the flag if the FFT/IFFT computation overflows. The flag starts as soon as the core detects overflow. The flag ends when the current output data frame ends)

Ports available when NATIV_AXI4=0

DATAI_IM

In

DATA_BITS

DATAI_RE

In

DATA_BITS

START

In

1

Imaginary input data to be transformed.
Real input data to be transformed.
Transformation start signal
Signifies the moment the first sample of an input data frame of N complex samples enters the core.
If the START comes when the previous input data frame has not been completed, the signal shall be ignored.

INVERSE

In

1

Inverse transformation When the signal is asserted, the core computes inverse FFT of the following data frame, otherwise forward FFT.

REFRESH

In

DATAO_IM

Out

DATAO_RE

Out

OUTP_READY Out

1
DATA_BITS DATA_BITS 1

Reloads the twiddle coefficient LUTs in the corresponding RAM blocks.
Imaginary output data
Real output data
FFT results are ready The core asserts the signal when it is about to output a frame of N FFT’ed data. The width of the signal is one clock interval.

DATAO_VALID Out

1

Output frame is valid
Accompanies valid output data frame. Once started, the signal lasts N clock cycles.
If the input data are coming continuously with no gaps in between frames, the DATAO_VALID once started will last indefinitely.

OVFLOW_FLAG Out

1

Arithmetic overflow flag CoreFFT asserts the flag if the FFT/IFFT computation overflows. The flag starts as soon as the core detects overflow. The flag ends when the current output data frame ends.

RFS

Out

1

Request for start The core asserts the signal when it is ready for the next input data frame. The signal starts as soon as the core is ready for the next frame. The signal ends when the core gets the requested START signal.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 18

CoreFFT v8.0
Interface
Important:All signals are active-high (logic 1) unless otherwise specified.

2.2.3

Input/Output Data frame format for AXI4 Streaming Interface When AXI4 Streaming interface is selected, the input and output Data frames are available as cascaded Real and Imaginary data, the data samples are first padded with zeros to match byte boundaries to facilitate AXI4 streaming.
For example, DATA_BITS of 26, nearest byte boundary is 32, so need to append six 0’s for real and imaginary data samples before cascading to frame AXI4 streaming I/O DATA
Table 2-5. AXI4 Streaming Interface I/O Data frame format

Bits: 63…58 0’s Padding

Bits: 57…32 Imaginary Data

Bits: 31..26 0’s Padding

Bits: 25…0 Real Data

Tip:See AXI4S_IN_DATA and AXI4S_OUT_DATA parameter description for zero padding in Table 2-3.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 19

CoreFFT v8.0
Timing Diagrams
3. Timing Diagrams
This section describes the timing diagram of CoreFFT.
3.1 In-Place FFT
When the in-place FFT asserts the BUF_READY signal, a data source starts supplying the data samples to be transformed. Imaginary and real halves of the input data sample must be supplied simultaneously and accompanied with the validity bit DATAI_VALID. The data source can supply the sample at every clock cycle or at an arbitrary slower rate (refer to Figure 3-1). Once the FFT module receives N-input samples, it lowers the BUF_READY signal. The FFT engine starts processing the data automatically after it is ready. In the minimal memory configuration, the processing phase starts immediately after data loading is complete. In the buffered configuration, the FFT engine can wait until the previous data burst is processed. Then, the engine starts automatically. The following figure shows the loading of input data. Figure 3-1. Loading Input Data
Upon completing the transformation, the FFT module asserts the OUTP_READY signal and starts generating the FFT results. The imaginary and real halves of the output samples appear simultaneously on DATAO_IM and DATAO_RE multibit outputs. Every output sample is accompanied by the DATAO_VALID bit. The data receiver accepts the transformed data either at every clock cycle or at an arbitrary slower rate. The FFT module keeps providing data output while the READ_OUTP signal is asserted. To control the output sample rate, the receiver must deassert the READ_OUTP signal as and when needed (as shown in the following figure). The following figure shows the receiving of the transform data. Figure 3-2. Receiving Transformed Data

When using the READ_OUTP signal to control reading rate, possible FFT cycle growth needs to be considered. In the minimal memory configuration, any prolongation of the read (upload) time extends the FFT cycle see Figure 1-2. In the buffered configuration, the FFT cycle grows when the actual upload time exceeds the dedicated interval shown in Figure 1-3 as “Available for reading results of cycle i.”. Also, in the buffered configuration, the output buffer starts accepting the fresh FFT results even if the older results have not been read out, thus overwriting the older ones. In this case, the core deasserts the OUTP_READY and the DATAO_VALID signals when they are no longer valid.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 20

CoreFFT v8.0
Timing Diagrams

3.2
3.2.1

Streaming FFT
For AXI4S interface, the operation of AXI4S interface ports are mapped to that of native interface. For one to one mapping, see Table 2-4 in Ports of 2.2. Streaming FFT.
RFS and START The core generates the RFS signal to let a data source know that it is ready for the next frame of the input data samples. After it is asserted, the RFS stays active until the data source responds with the START signal.
Once the core gets the START, it deasserts the RFS signal and starts receiving the input data frame. After N clock intervals, the data frame reception is completed, and the RFS signal goes active again. The following figure shows an example when the FFT engine waits for the data source to supply the START signal.
Figure 3-3. RFS Waits for START

The START signal has a permanent active value, and the core starts receiving another input frame right after the end of a previous frame. It is optional for the data source to watch for the RFS signal. It can assert the START signal at any time, and the core starts accepting another input frame as soon as it can. In the situation of the Figure 3-3, a new frame loading starts immediately after the START signal. If the START signal comes when a previous input frame is being loaded, the core waits until the frame ends and then starts loading another frame. The following figure shows another example where the input data come indefinitely without gaps between the frames. Figure 3-4. Transforming Streaming Data
The following figure shows the START signal leads the actual input frame by one clock interval. Figure 3-5. START Leads the Data

3.2.2

OUTP_READY and DATAO_VALID
These two signals serve to notify a data receiver when the FFT results are ready. The OUTP_READY is a clock-wide pulse. The core asserts when the output data frame is about to output. The core asserts the DATAO_VALID signal while generating the output frame. The DATAO_VALID signal trails the OUTP_READY signal by one clock interval. The following figure shows the timing relations between the two signals and the FFTed data frame.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 21

Figure 3-6. Output Data and Handshake Signals

CoreFFT v8.0
Timing Diagrams

The following figure shows a scenario where the DATAO_VALID signal is permanently active when the streaming data has no gaps between the frames.
Figure 3-7. Streaming Output Data without Gaps

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 22

CoreFFT v8.0
Tool Flow
4. Tool Flow
This section describes the tool flow of CoreFFT.
4.1 License
CoreFFT is license locked.
4.2 Configuring CoreFFT in SmartDesign
CoreFFT is available for download in the Libero® IP catalog through the web repository. After it is listed in the catalog, the core can be instantiated using the SmartDesign flow. To know how to create SmartDesign project, see SmartDesign User Guide. After configuring and generating the core instance, the basic functionality can be simulated using the test-bench supplied with CoreFFT. The testbench parameters automatically adjust to the CoreFFT configuration. CoreFFT can be instantiated as a component of a larger design.
Important:CoreFFT is compatible with both Libero integrated design environment (IDE) and Libero SoC. Unless specified otherwise, this document uses the name Libero to identify both Libero IDE and Libero SoC. Figure 4-1. SmartDesign CoreFFT Instance View
The core can be configured using the configuration Graphical User Interface (GUI) within SmartDesign. An example of the GUI for the SmartFusion2 family is shown in the following figure.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 23

Figure 4-2. Configuring CoreFFT in SmartDesign

CoreFFT v8.0
Tool Flow

4.3 Simulation Flows
The user testbench for CoreFFT is included in the release. To do this, perform the following steps: 1. To run the user testbench, set the Design Root to the CoreFFT instantiation in the Libero SoC design hierarchy pane. 2. Under Verify Pre- Synthesized Design, in the Libero SoC Design Flow window, right click Simulate, and then select Open Interactively. This invokes ModelSim and automatically runs the simulation.
Important:When simulating the VHDL version of the core, you might want to get rid of the IEEE.NUMERIC_STD library warnings. To do so, add the following two lines to the automatically generated run.do file:
· set NumericStdNoWarnings -1 · set StdArithNoWarnings -1

4.3.1 4.3.1.1

Testbench The unified testbench used to verify and test CoreFFT is called as user testbench.
User Testbench The following figure shows the block diagram for testbench. The following equation shows how the golden behavioral FFT implements the finite precision calculations shown in
x(k) = n= 0N-1X(n)e?jnk2?/N

Equation 1 or Equation 2 in Introduction , both the golden FFT and CoreFFT are configured identically and receive the same test signal. The testbench compares the output signals of the golden module and the actual CoreFFT.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 24

Figure 4-3. CoreFFT User Testbench

CoreFFT v8.0
Tool Flow

The testbench provides examples of how to use the generated FFT module. The testbench can be modified according to the requirements.
4.4 Design Constraints
Core timing needs exceptions (that is, false path and multi cycle path) are to be used between the clock boundaries. For reference on required constraints to be added, see CoreFFT.sdc from the path.

/component/Actel/DirectCores/CoreFFT//constraints/ CoreFFT.sdc. 4.5 Synthesis in Libero SoC To run the synthesis of the selected configuration, perform the following steps: 1. Set the design root appropriately in the configuration GUI. 2. Under Implement Design, in the Design Flow tab, right click on Synthesize and select Run. 4.6 Place-and-Route in Libero SoC After setting the design root appropriately and run Synthesis. Under Implement Design in the Design Flow tab, right click on Place and Route and click Run.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 25

CoreFFT v8.0
System Integration
5. System Integration
This section provides an example that shows the integration of CoreFFT.
5.1 In-Place FFT
The following figure shows an example of using the core. When the in-place FFT asserts the BUF_READY signal, a data source starts supplying the data samples to be transformed. Imaginary and real halves of the input data sample must be supplied simultaneously and accompanied with the validity bit-DATAI_VALID. The data source can supply the sample at every clock cycle or at an arbitrary slower rate (see Figure 3-1). After the FFT module receives N-input samples, it lowers the BUF_READY signal. Figure 5-1. Example of the In-Place FFT System

The FFT engine starts processing the data automatically after it is ready. In the minimal memory configuration, the processing phase starts immediately after data loading is complete. In the buffered configuration, the FFT engine can wait until a previous data burst is processed. Then the engine starts automatically.
5.2 Streaming FFT
The core performs forward FFT over the data coming at every clock cycle. The data source keeps supplying the data while the data receiver continuously receives the FFT-ed results and monitors the overflow flag if necessary. The optional input START signal and the output RFS signal can be used if processing of the data frames is required. The data source generates the START signal to mark the beginning of another frame, and the data receiver uses the RFS signal to mark the beginning of the output frame. Streaming CoreFFT can process infinite complex data streams, as shown in the following figure.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 26

Figure 5-2. Example of a Streaming FFT System

CoreFFT v8.0
System Integration

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 27

CoreFFT v8.0
Appendix A: In-Place FFT Device Utilization …

6. Appendix A: In-Place FFT Device Utilization and Performance
Table 6-1 and Table 6-2 show utilization and performance for a variety of in- place FFT sizes and data widths. The numbers were obtained from the configuration listed in Table 6-3.
Table 6-1. In-Place FFT SmartFusion2 M2S050 Device Utilization and Performance (Minimal Memory Configuration)

Core Parameters

Fabric Resource Usage

Blocks

Performance

Points 256

Width 18

DFF 1227

4 LUT 1245

Total 2472

LSRAM MACC

3

4

Clock Rate
328

FFT Time (s)
3.3

512

18

1262

1521

2783

3

4

321

7.4

1024

18

1299

2029

3328

3

4

310

16.8

4096

18

1685

4190

5875

12

4

288

85.7

Table 6-2. In-Place FFT SmartFusion2 M2S050 Device Utilization and Performance (Buffered Configuration)

Core Parameters

POINTS WIDTH

256

18

512

18

1024

18

4096

18

Fabric Resource Usage

DFF

4LUT

Total

1487

1558

3045

1527

1820

3347

1579

2346

3925

2418

4955

7372

Blocks LSRAM 7 7 7 28

MACC 4 4 4 4

Performance

Clock Rate FFT Time (s)

328

3.3

321

7.4

310

16.8

281

87.8

Tip: · Data in Table 6-1 and Table 6-2 were obtained using typical synthesis settings. The Synplify frequency (MHz) was set to 500
· The utilization numbers are obtained using Libero v12.4 and there can be potential area and performance improvement with newer revisions
· In synthesis settings, ROM components are mapped to logic and RAM optimization mapped for High Speed
· Layout settings were as follows:
­ Designer block creation enabled
­ High Effort Layout enabled
· The FFT time shown reflects the transformation time only. It does not account for data downloading or result uploading times

Table 6-3. In-Place FFT PolarFire MPF300 Devices Utilization and Performance (Minimal Memory Configuration)

Core Parameters

Fabric Resource Usage

Max Clock

POINTS WIDTH uRAM Depth 4 LUT DFF uRAM LSRAM MACC Frequency

64

18

512

939 1189 9

0

4

415

Transform Time (uS)
0.6

128

18

512

1087 1254 9

0

4

415

1.2

256

18

512

1501 1470 18 0

4

415

2.6

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 28

CoreFFT v8.0
Appendix A: In-Place FFT Device Utilization …

………..continued

Core Parameters

Fabric Resource Usage

Max Clock

POINTS WIDTH uRAM Depth 4 LUT DFF uRAM LSRAM MACC Frequency

512

18

0

1519 1275 0

3

4

386

512

25

0

2494 2841 0

6

16

364

1024 25

0

3088 2859 0

6

16

369

4096 18

0

4161 1679 0

12

4

352

4096 25

0

6426 3237 0

15

16

339

16384 18

0

9667 3234 0

54

4

296

16384 25

0

17285 5483 0

75

16

325

Transform Time (uS)
6.2 6.7 14.3 70.1 73 387 353.5

Table 6-4. In-Place FFT PolarFire MPF300 Device Utilization and Performance (Buffered Configuration)

Core Parameters

Fabric Resource Usage

Max Clock

POINTS WIDTH uRAM Depth 4 LUT DFF uRAM LSRAM MACC Frequency

Transform Time (uS)

64

18

512

1294 1543 21 0

4

351

0.7

256

18

512

2099 2050 42 0

4

351

3.1

512

18

512

2858 2858 84 0

4

351

6.8

1024 18

512

4962 4488 168 0

4

278

18.7

16384 18

0

12346 6219 0

126

4

335

342

Tip: · Data in Table 6-3 and Table 6-4 were obtained using typical Libero SoC tool settings. The Timing constraint was set to 400 MHz
· The utilization numbers are obtained using Libero v12.4 and there can be potential area and performance improvement with newer revisions
· In synthesis settings, ROM components are mapped to logic and RAM optimization mapped for High Speed
· Place and Route was set for Timing-driven High Effort Layout
· The FFT time reflects the transformation time only. It does not account for data downloading or result uploading times

Important:FPGA resources and performance data for the PolarFire SoC family is similar to the PolarFire family.

Table 6-5. In-Place FFT Utilization and Performance Configuration Parameter INVERSE SCALE SCALE_EXP_ON HDL type

Value 0 0 0 Verilog

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 29

CoreFFT v8.0
Appendix B: Streaming FFT Device Utilization…

7. Appendix B: Streaming FFT Device Utilization and Performance
The following tables lists the utilization and performance for a variety of streaming FFT configurations.
Table 7-1. Streaming FFT SmartFusion2 M2S050T Speed Grade -1

Core Parameters

Resource Usage

Blocks

Clock Rate

FFT_SIZE DATA_BITS TWID_BITS Order DFF 4LUT Total LSRAM uRAM MACC

16

18

18

Reverse 2198 1886 4084 0

11

8

241

16

18

18

Normal 1963 1600 3563 0

5

8

241

32

18

18

Reverse 3268 2739 6007 0

16

16

225

64

18

18

Reverse 3867 3355 7222 0

19

16

217

128

18

18

Reverse 4892 4355 9247 5

16

24

216

256

18

18

Reverse 5510 5302 10812 7

16

24

229

256

18

18

Normal 5330 5067 10406 3

16

24

229

256

24

25

Reverse 8642 7558 16200 8

21

48

223

512

18

18

Reverse 6634 6861 13495 10

16

32

228

512

18

24

Reverse 9302 8862 18164 12

18

64

228

1024

24

24

Reverse 10847 11748 22595 17

18

64

225

1024

24

25

Reverse 11643 12425 24068 19

22

64

221

Tip: · uRAM maximum depth was set at 64
· The utilization numbers are obtained using Libero v12.4, and there can be potential area and performance improvement with newer revisions
· In synthesis settings, ROM components are mapped to logic and RAM optimization mapped for High Speed. The Synplify frequency was set to 500
· Layout high effort mode was set

Table 7-2. Streaming FFT PolarFire MPF300 Speed Grade -1

Core Parameters
FFT_SIZE DATA_BIT TWID_BITS SCALE uRAM Order Depth

Resource Usage

Clock

4LUT DFF uRAM LSRAM MACC Rate

16

16

18

On

256 Reverse 1306 1593 6

0

4

319

16

16

18

On

256 Normal 1421 1700 12 0

4

319

32

16

18

On

256 Reverse 1967 2268 18 0

8

319

64

16

18

On

256 Reverse 2459 2692 15 0

8

319

128

20

18

On

256 Normal 4633 4911 44 0

24

310

256

22

18

Off

256 Normal 6596 6922 94 0

24

307

256

24

25

512

18

18

On

0

On

0

Reverse 8124 8064 0

14

48

304

Reverse 6686 5691 0

9

32

293

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 30

CoreFFT v8.0
Appendix B: Streaming FFT Device Utilization…

………..continued Core Parameters
FFT_SIZE DATA_BIT TWID_BITS SCALE uRAM Order Depth

Resource Usage

Clock

4LUT DFF uRAM LSRAM MACC Rate

1024

24

25

On

0

Reverse 13974 10569 0

21

64

304

1024

18

18

On

0

Normal 14289 10816 0

27

64

307

2048

18

18

On

0

Normal 12852 7640 0

24

40

304

2048

18

18

On

0

Reverse 12469 7319 0

16

40

315

4096

24

25

On

0

Normal 29977 14288 0

59

80

305

4096

28

28

On

512 Normal 34448 17097 120 48

80

301

Tip: · Data in the preceding table were obtained using the typical Libero SoC tool settings. The Timing constraint was set to 400 MHz
· Device utilization numbers of the streaming architecture are nearly same for both AXI4S interface and native interface
· The utilization numbers are obtained using Libero v12.4, and there can be potential area and performance improvement with newer revisions
· In synthesis settings, ROM components are mapped to logic and RAM optimization mapped for High Speed
· Place and Route was set for the Timing-driven High Effort Layout
· FPGA resources and performance data for the PolarFire SoC family is similar to the PolarFire family

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 31

CoreFFT v8.0
Revision History

8. Revision History
The revision history describes the changes that were implemented in the document. The changes are listed by revision, starting with the most current publication.
Table 8-1. Revision History

Revision Date Description

C

08/2022 In revision C of the document, updated Table 6-1, Table 6-2, Table 6-3, Table 6-4, Table 7-1,

and Table 7-2.

B

07/2022 The following is the list of changes in revision B of the document:

· Updated: Table 2-2 in 2.1.2. Ports.

· Updated: Table 2-4 in 2.2.2. Ports.

· Updated: 4.4. Design Constraints.

· Removed: “Configuring Timing Constraints” section.

A

07/2022 The following is the list of changes in revision A of the document:

· The document was migrated to the Microchip template.

· The document number was updated to DS50003348A from 50200267.

· Following sections are updated:

­ Table 1 in Features.

­ Device Utilization and Performance.

­ Table 1-2 in 1.4.4.2. Unscaled Mode Input Bit Width Limitations.

­ Figure 1-7 in 1.4.4.3. Entering Scale Schedule.

­ Table 1-3 in 1.4.4.3. Entering Scale Schedule.

­ Table 2-3 in 2.2.1. Configuration Parameters.

­ Table 2-4 in 2.2.2. Ports.

­ Table 2-2 in 2.1.2. Ports.

­ Figure 4-2 in 4.2. Configuring CoreFFT in SmartDesign.

· Following sections are added: ­ 1.4.3. Streaming FFT Output Data Words Order. ­ 2.2.3. Input/Output Data frame format for AXI4 Streaming Interface. ­ 4.3. Simulation Flows. ­ 4.4. Design Constraints. ­ 4.5. Synthesis in Libero SoC. ­ 4.6. Place-and-Route in Libero SoC.
· Following sections are removed: ­ “Supported Version.” ­ “Natural Output Order.”

10

Added PolarFire® SoC support.

9

“Product Support “: Removed.

8

Updated changes related to CoreFFT v7.0.

7

Updated changes related to CoreFFT v6.4.

6

Updated changes related to CoreFFT v6.3.

5

Updated changes related to Supported Families (SAR 47942).

4

Updated changes related to CoreFFT v6.1.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 32

CoreFFT v8.0
Revision History

………..continued Revision Date

3

2

1

Description
The following is the list of changes in revision3.0 of the document: · Updated changes related to CoreFFT v6.0. · The release adds support for SmartFusion2 family (In-Place architecture only).
The following is the list of changes in revision 2.0 of the document: · Updated changes related to CoreFFT v5.0. · This release adds a new architecture to the existing In-place CoreFFT v4.0. · The new architecture supports Streaming Forward and Inverse FFT that transforms high speed stream of data.
Initial release.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 33

CoreFFT v8.0
Microchip FPGA Support
Microchip FPGA products group backs its products with various support services, including Customer Service, Customer Technical Support Center, a website, and worldwide sales offices. Customers are suggested to visit Microchip online resources prior to contacting support as it is very likely that their queries have been already answered. Contact Technical Support Center through the website at www.microchip.com/support. Mention the FPGA Device Part number, select appropriate case category, and upload design files while creating a technical support case. Contact Customer Service for non- technical product support, such as product pricing, product upgrades, update information, order status, and authorization.
· From North America, call 800.262.1060 · From the rest of the world, call 650.318.4460 · Fax, from anywhere in the world, 650.318.8044
Microchip Information
The Microchip Website
Microchip provides online support via our website at www.microchip.com/. This website is used to make files and information easily available to customers. Some of the content available includes:
· Product Support ­ Data sheets and errata, application notes and sample programs, design resources, user’s guides and hardware support documents, latest software releases and archived software
· General Technical Support ­ Frequently Asked Questions (FAQs), technical support requests, online discussion groups, Microchip design partner program member listing
· Business of Microchip ­ Product selector and ordering guides, latest Microchip press releases, listing of seminars and events, listings of Microchip sales offices, distributors and factory representatives
Product Change Notification Service
Microchip’s product change notification service helps keep customers current on Microchip products. Subscribers will receive email notification whenever there are changes, updates, revisions or errata related to a specified product family or development tool of interest. To register, go to www.microchip.com/pcn and follow the registration instructions.
Customer Support
Users of Microchip products can receive assistance through several channels: · Distributor or Representative · Local Sales Office · Embedded Solutions Engineer (ESE) · Technical Support
Customers should contact their distributor, representative or ESE for support. Local sales offices are also available to help customers. A listing of sales offices and locations is included in this document. Technical support is available through the website at: www.microchip.com/support
Microchip Devices Code Protection Feature
Note the following details of the code protection feature on Microchip products:

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 34

CoreFFT v8.0
· Microchip products meet the specifications contained in their particular Microchip Data Sheet. · Microchip believes that its family of products is secure when used in the intended manner, within operating
specifications, and under normal conditions. · Microchip values and aggressively protects its intellectual property rights. Attempts to breach the code
protection features of Microchip product is strictly prohibited and may violate the Digital Millennium Copyright Act. · Neither Microchip nor any other semiconductor manufacturer can guarantee the security of its code. Code protection does not mean that we are guaranteeing the product is “unbreakable”. Code protection is constantly evolving. Microchip is committed to continuously improving the code protection features of our products.
Legal Notice
This publication and the information herein may be used only with Microchip products, including to design, test, and integrate Microchip products with your application. Use of this information in any other manner violates these terms. Information regarding device applications is provided only for your convenience and may be superseded by updates. It is your responsibility to ensure that your application meets with your specifications. Contact your local Microchip sales office for additional support or, obtain additional support at www.microchip.com/en-us/support/ design-help/client-support- services.
THIS INFORMATION IS PROVIDED BY MICROCHIP “AS IS”. MICROCHIP MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WHETHER EXPRESS OR IMPLIED, WRITTEN OR ORAL, STATUTORY OR OTHERWISE, RELATED TO THE INFORMATION INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE, OR WARRANTIES RELATED TO ITS CONDITION, QUALITY, OR PERFORMANCE.
IN NO EVENT WILL MICROCHIP BE LIABLE FOR ANY INDIRECT, SPECIAL, PUNITIVE, INCIDENTAL, OR CONSEQUENTIAL LOSS, DAMAGE, COST, OR EXPENSE OF ANY KIND WHATSOEVER RELATED TO THE INFORMATION OR ITS USE, HOWEVER CAUSED, EVEN IF MICROCHIP HAS BEEN ADVISED OF THE POSSIBILITY OR THE DAMAGES ARE FORESEEABLE. TO THE FULLEST EXTENT ALLOWED BY LAW, MICROCHIP’S TOTAL LIABILITY ON ALL CLAIMS IN ANY WAY RELATED TO THE INFORMATION OR ITS USE WILL NOT EXCEED THE AMOUNT OF FEES, IF ANY, THAT YOU HAVE PAID DIRECTLY TO MICROCHIP FOR THE INFORMATION.
Use of Microchip devices in life support and/or safety applications is entirely at the buyer’s risk, and the buyer agrees to defend, indemnify and hold harmless Microchip from any and all damages, claims, suits, or expenses resulting from such use. No licenses are conveyed, implicitly or otherwise, under any Microchip intellectual property rights unless otherwise stated.
Trademarks
The Microchip name and logo, the Microchip logo, Adaptec, AVR, AVR logo, AVR Freaks, BesTime, BitCloud, CryptoMemory, CryptoRF, dsPIC, flexPWR, HELDO, IGLOO, JukeBlox, KeeLoq, Kleer, LANCheck, LinkMD, maXStylus, maXTouch, MediaLB, megaAVR, Microsemi, Microsemi logo, MOST, MOST logo, MPLAB, OptoLyzer, PIC, picoPower, PICSTART, PIC32 logo, PolarFire, Prochip Designer, QTouch, SAM-BA, SenGenuity, SpyNIC, SST, SST Logo, SuperFlash, Symmetricom, SyncServer, Tachyon, TimeSource, tinyAVR, UNI/O, Vectron, and XMEGA are registered trademarks of Microchip Technology Incorporated in the U.S.A. and other countries.
AgileSwitch, APT, ClockWorks, The Embedded Control Solutions Company, EtherSynch, Flashtec, Hyper Speed Control, HyperLight Load, Libero, motorBench, mTouch, Powermite 3, Precision Edge, ProASIC, ProASIC Plus, ProASIC Plus logo, Quiet- Wire, SmartFusion, SyncWorld, Temux, TimeCesium, TimeHub, TimePictra, TimeProvider, TrueTime, and ZL are registered trademarks of Microchip Technology Incorporated in the U.S.A.
Adjacent Key Suppression, AKS, Analog-for-the-Digital Age, Any Capacitor, AnyIn, AnyOut, Augmented Switching, BlueSky, BodyCom, Clockstudio, CodeGuard, CryptoAuthentication, CryptoAutomotive, CryptoCompanion, CryptoController, dsPICDEM, dsPICDEM.net, Dynamic Average Matching, DAM, ECAN, Espresso T1S, EtherGREEN, GridTime, IdealBridge, In-Circuit Serial Programming, ICSP, INICnet, Intelligent Paralleling, IntelliMOS, Inter-Chip Connectivity, JitterBlocker, Knob-on-Display, KoD, maxCrypto, maxView, memBrain, Mindi, MiWi, MPASM, MPF, MPLAB Certified logo, MPLIB, MPLINK, MultiTRAK, NetDetach, Omniscient Code Generation, PICDEM, PICDEM.net, PICkit, PICtail, PowerSmart, PureSilicon, QMatrix, REAL ICE, Ripple Blocker, RTAX, RTG4, SAM-

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 35

CoreFFT v8.0
ICE, Serial Quad I/O, simpleMAP, SimpliPHY, SmartBuffer, SmartHLS, SMART-I.S., storClad, SQI, SuperSwitcher, SuperSwitcher II, Switchtec, SynchroPHY, Total Endurance, Trusted Time, TSHARC, USBCheck, VariSense, VectorBlox, VeriPHY, ViewSpan, WiperLock, XpressConnect, and ZENA are trademarks of Microchip Technology Incorporated in the U.S.A. and other countries. SQTP is a service mark of Microchip Technology Incorporated in the U.S.A. The Adaptec logo, Frequency on Demand, Silicon Storage Technology, and Symmcom are registered trademarks of Microchip Technology Inc. in other countries. GestIC is a registered trademark of Microchip Technology Germany II GmbH & Co. KG, a subsidiary of Microchip Technology Inc., in other countries. All other trademarks mentioned herein are property of their respective companies. © 2022, Microchip Technology Incorporated and its subsidiaries. All Rights Reserved. ISBN: 978-1-6683-1058-8
Quality Management System
For information regarding Microchip’s Quality Management Systems, please visit www.microchip.com/quality.

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 36

AMERICAS
Corporate Office 2355 West Chandler Blvd. Chandler, AZ 85224-6199 Tel: 480-792-7200 Fax: 480-792-7277 Technical Support: www.microchip.com/support Web Address: www.microchip.com Atlanta Duluth, GA Tel: 678-957-9614 Fax: 678-957-1455 Austin, TX Tel: 512-257-3370 Boston Westborough, MA Tel: 774-760-0087 Fax: 774-760-0088 Chicago Itasca, IL Tel: 630-285-0071 Fax: 630-285-0075 Dallas Addison, TX Tel: 972-818-7423 Fax: 972-818-2924 Detroit Novi, MI Tel: 248-848-4000 Houston, TX Tel: 281-894-5983 Indianapolis Noblesville, IN Tel: 317-773-8323 Fax: 317-773-5453 Tel: 317-536-2380 Los Angeles Mission Viejo, CA Tel: 949-462-9523 Fax: 949-462-9608 Tel: 951-273-7800 Raleigh, NC Tel: 919-844-7510 New York, NY Tel: 631-435-6000 San Jose, CA Tel: 408-735-9110 Tel: 408-436-4270 Canada – Toronto Tel: 905-695-1980 Fax: 905-695-2078

Worldwide Sales and Service

ASIA/PACIFIC
Australia – Sydney Tel: 61-2-9868-6733 China – Beijing Tel: 86-10-8569-7000 China – Chengdu Tel: 86-28-8665-5511 China – Chongqing Tel: 86-23-8980-9588 China – Dongguan Tel: 86-769-8702-9880 China – Guangzhou Tel: 86-20-8755-8029 China – Hangzhou Tel: 86-571-8792-8115 China – Hong Kong SAR Tel: 852-2943-5100 China – Nanjing Tel: 86-25-8473-2460 China – Qingdao Tel: 86-532-8502-7355 China – Shanghai Tel: 86-21-3326-8000 China – Shenyang Tel: 86-24-2334-2829 China – Shenzhen Tel: 86-755-8864-2200 China – Suzhou Tel: 86-186-6233-1526 China – Wuhan Tel: 86-27-5980-5300 China – Xian Tel: 86-29-8833-7252 China – Xiamen Tel: 86-592-2388138 China – Zhuhai Tel: 86-756-3210040

ASIA/PACIFIC
India – Bangalore Tel: 91-80-3090-4444 India – New Delhi Tel: 91-11-4160-8631 India – Pune Tel: 91-20-4121-0141 Japan – Osaka Tel: 81-6-6152-7160 Japan – Tokyo Tel: 81-3-6880- 3770 Korea – Daegu Tel: 82-53-744-4301 Korea – Seoul Tel: 82-2-554-7200 Malaysia – Kuala Lumpur Tel: 60-3-7651-7906 Malaysia – Penang Tel: 60-4-227-8870 Philippines – Manila Tel: 63-2-634-9065 Singapore Tel: 65-6334-8870 Taiwan – Hsin Chu Tel: 886-3-577-8366 Taiwan – Kaohsiung Tel: 886-7-213-7830 Taiwan – Taipei Tel: 886-2-2508-8600 Thailand – Bangkok Tel: 66-2-694-1351 Vietnam – Ho Chi Minh Tel: 84-28-5448-2100

EUROPE
Austria – Wels Tel: 43-7242-2244-39 Fax: 43-7242-2244-393 Denmark – Copenhagen Tel: 45-4485-5910 Fax: 45-4485-2829 Finland – Espoo Tel: 358-9-4520-820 France – Paris Tel: 33-1-69-53-63-20 Fax: 33-1-69-30-90-79 Germany – Garching Tel: 49-8931-9700 Germany – Haan Tel: 49-2129-3766400 Germany – Heilbronn Tel: 49-7131-72400 Germany – Karlsruhe Tel: 49-721-625370 Germany – Munich Tel: 49-89-627-144-0 Fax: 49-89-627-144-44 Germany – Rosenheim Tel: 49-8031-354-560 Israel – Ra’anana Tel: 972-9-744-7705 Italy – Milan Tel: 39-0331-742611 Fax: 39-0331-466781 Italy – Padova Tel: 39-049-7625286 Netherlands – Drunen Tel: 31-416-690399 Fax: 31-416-690340 Norway – Trondheim Tel: 47-72884388 Poland – Warsaw Tel: 48-22-3325737 Romania – Bucharest Tel: 40-21-407-87-50 Spain – Madrid Tel: 34-91-708-08-90 Fax: 34-91-708-08-91 Sweden – Gothenberg Tel: 46-31-704-60-40 Sweden – Stockholm Tel: 46-8-5090-4654 UK – Wokingham Tel: 44-118-921-5800 Fax: 44-118-921-5820

© 2022 Microchip Technology Inc.
and its subsidiaries

User Guide

DS50003348C-page 37

References

Read User Manual Online (PDF format)

Read User Manual Online (PDF format)  >>

Download This Manual (PDF format)

Download this manual  >>

Related Manuals