Mouser goes global with GOWIN FPGAs User Guide
- June 4, 2024
- GOWIN
Table of Contents
- Mouser goes global with GOWIN FPGAs
- About This Guide
- Finite State Machine Requirements
- Gowin FPGA Hardware Features
- Lower Power Coding
- Coding to Avoid Simulation/Synthesis Mismatches
- Design Planning
- Gowin Software Design Planning Flow
- Pinout
- Clock Assignment
- Logic Resource Constraint
- Timing Closure
- Timing Closure Strategies in Synthesis
- Place & Route Timing Report Analysis
- References
- Read User Manual Online (PDF format)
- Download This Manual (PDF format)
Mouser goes global with GOWIN FPGAs
About This Guide
Purpose
This manual provides the descriptions of coding guidelines, design planning,
and timing closure, and these factors directly determine the success of FPGA
design. Coding style directly affects the implementation of the FPGA design
and ultimately the performance of the design. Although the synthesis tool
integrates a series of optimization algorithms, it is still necessary for you
to follow a certain coding style to guide the synthesis tool to achieve
optimal results on a specific FPGA architecture.
Design planning helps you target the design to the chosen FPGA device and
balance the associated area and speed requirements in a manner that takes full
advantage of the functions and features supported by Gowin devices. Timing
closure can ensure that a user design meets a specific timing requirement.
This chapter describes timing requirements, timing constraints, and timing
optimization.
Related Documents
The latest user guides are available on the GOWINSEMI Website. You can find the related documents at www.gowinsemi.com:
- SUG100, Gowin Software User Guide
- SUG940, Gowin Design Timing Constraints Guide
- SUG935, Gowin Design Physical Constraints Guide
Terminology and Abbreviations
The abbreviations and terminology used in this manual are outlined in the table below.
Table 1-1 Terminology and Abbreviations
Terminology and Abbreviations | Meaning |
---|---|
FPGA | Field Programmable Gate Array |
HDL | Hardware Description Language |
Terminology and Abbreviations | Meaning |
--- | --- |
FSM | Finite State Machine |
DSP | Digital Signal Processing |
BSRAM | Block Static Random Access Memory |
SSRAM | Shadow Static Random Access Memory |
RTL | Register Transfer Level |
CST | Physical constraint |
SDC | Synopsys Design Constraint |
CFU | Configurable Function Unit |
CLS | Configurable Logic Section |
Support and Feedback
Gowin Semiconductor provides customers with comprehensive technical support. If you have any questions, comments, or suggestions, please feel free to contact us directly by the following ways.
- Website: www.gowinsemi.com
- E-mail: support@gowinsemi.com
Guideline on HDL Coding
**General Requirements of HDL Coding
**
Coding for Hierarchical Synthesis
Complex system designs require a hierarchical approach as opposed to the
use of single modules. A hierarchical coded design can be synthesized as a
whole or have each module synthesized separately. When the design is
synthesized as a whole, it can be synthesized as either a flat module or
multiple hierarchical modules. Each methodology has its associated advantages
and disadvantages. Hierarchical coding is preferable for complex system
designs because designs in smaller blocks are easier to keep track of and
reduce the design period by allowing the reuse of design modules. Here are
some tips for building hierarchical structures:
- The top level should only be used to call the submodule. It is better to implement control logic in submodules.
- Any I/O instantiations should be at the top level of the hierarchy.
- All input, output, and bidirectional pins instantiations should be at the top level of the hierarchy;
- The tristate statement for bidirectional pins should be written at the top module.
- Ensure that the output signals of all modules use registers. The advantages are as follows:
- Combinational logic and registers are in one module, which helps to overcome the lack of no cross-module synthesis.
- The related logics placed in one module can realize resource sharing and key paths optimization.
- Divide irrelevant logics into different modules, then you can use different optimization strategies, such as speed first or area first.
Pipeline Design Requirements
Pipeline design can improve design performance by restructuring a long data
path into several levels of logic and breaking it up over multiple clock
cycles. Pipeline structure is an effective way to improve data path speed;
however, special care must be taken due to the additional clock cycle latency
of data path.
Comparing If-Then-Else and Case Statements
The if-then-else statement generates priority-encoded logic, whereas the
case statement implements balanced logic, and it is recommended that if
statement be nested less than 5 levels. If-then-else statements can contain a
set of different expressions, but a case statement can contain only one
expression. If-then-else statements and case statements are equivalent if the
conditions are mutually exclusive.
Avoiding Unintentional Latches
FPGA users should avoid using latches. The synthesis tools can build them
out with feedback loops, which will increase the design area and result in
performance degradation and problems with static timing analysis by
introducing combinatorial feedback loops. The Synthesis tool infers latches
from incomplete conditional expressions; for example, an if-then-else
statement without an else clause or a case statement without a default clause.
To avoid unintentional latches, specify all conditions explicitly. If the
client uses the case statement and does not care about the output value of the
default condition, / synthesis full_case/ constraint can be added to avoid
the generation of the latch.
Global Reset and Local Reset
A global set/reset (GSR) network is built into Gowin devices. There is a
direct connection to the core logic. It can be used as an
asynchronous/synchronous set or asynchronous/synchronous reset. The registers
in CFU and I/O can be individually configured to use the GSR. The global reset
resource provides a convenient mechanism by which design components can be
reset without using any general routing resources. Local reset usually has
smaller fan-out. It is recommended to use common routing as a reset signal.
Clock Enable
Gating clocks is not encouraged in FPGA designs because it can cause timing issues, such as unexpected clock skews. The CLS structure in Gowin devices contains dedicated clock enable signals. You can use clock enable as a better alternative to achieve the same functions without worrying about timing issues. You should take the following into considerations when using the clock enable function of Gowin devices.
- Clock enable is only supported by flip-flops, not latches.
- ach CLS in the same CFU shares one clock enable signal.
- All flip-flops have a positive clock enable input.
Multiplexer
The flexible configurations of LUTs within CLS can realize 2-to-1,
3-to-1, 4-to-1, or 5-to-1 multiplexers, etc. You can create larger
multiplexers by programming multiple four-input LUTs.
Bidirectional Buffers
Using bidirectional buffers allows for fewer device pins, controlling output enable, and reducing power consumption. You can disable automatic I/O insertion in your synthesis tool and then manually instantiate the I/O pads for specific pins as needed.
Cross Clock Domains
When passing data from one clock domain to another, special care must be taken to ensure that metastability issues do not arise as a result of set-up and hold timing violations. For single-bit signals, it is suggested to use a two /three-stage register structure to eliminate metastability. For multi-bit signals, it is suggested to use asynchronous FIFOs.
Memory Coding
Although RAM behavior description is portable and the coding is straightforward, different coding may generate different synthesis results. For Gowin devices, it’s recommended to use the IP Core Generator in Gowin Software to generate block memory, shadow memory, and FIFO. Gowin devices support multiple memory structures, including dual-port RAM, single-port RAM, semi-dual RAM, read-only ROM, and synchronous/asynchronous FIFO, as shown in Figure 2-1.
Figure 2-1 IP Core Generator Memory
DSP Coding
Although DSP behavior description is portable and the coding is straightforward, different coding may generate different synthesis results. For Gowin devices, it’s recommended to use the IP Core Generator in Gowin Software to generate DSP. Gowin devices support multiple DSP structures, including ALU54, MULT, MULTALU, MULTADDALU, and PADD, as shown in Figure 2-2.
Figure 2-2 IP Core Generator DSP
Finite State Machine Requirements
A finite state machine advances from the current state to the next state at the clock edge. This section discusses the methods and strategies for state machine coding.
General Description
There are three ways to implement a finite state machine; the first is to
process state-jump and state output simultaneously in one process; the second
is to process state-jump in one process, state-jump law and state output in
another process; the third is to process state-jump, state-jump law, and state
output in three independent processes respectively. It is recommended to use
the third. It is easier to be read and modified, and it will not cause extra
delay for outputting the state directly without registering.
State Coding Methods for State Machines
There are several ways to code a state machine, including binary, one-hot
encoding, and gray code. State machines with binary code and gray code have
minimal numbers of flip-flops and wide combinatorial logics, whereas one-hot
encoding is exactly the opposite. The greatest advantage of one-hot encoding
is that only one bit is required for state comparing. As a result, it
decreases the decoding logic Although one-hot encoding uses more bits, i.e.
more flip-flops for same states, its coding circuit saves an equivalent area
to offset the area consumed by flip-flops. For small state machines, less than
five states, binary code and gray code are typically the defaults. For larger
state machines more than five states, one-hot is the default.
Initialization and Default State Values for Safe State Machines
A safe state machine must be initialized to a valid state after power-up.
You can initialize it during power-up or by including a reset operation to
bring it to a known state. In the same manner, a state machine should have a
default state to ensure that the state machine does not go into an invalid
state. This could happen if all the possible combinations have been clearly
defined in the design source code.
Gowin FPGA Hardware Features
I/O Logic
I/O logic supports Deserializer, Serializer, delay control, and byte
alignment, mainly used for high-speed data transmission. I/O logic supports
basic mode, SRD mode, and DDR mode, etc. You can use Gowin devices IO logic
according to your design requirements.
DSP
LittleBee® and Arora families DSP support 9-bit/18-bit signed/unsigned
multiplication, MAC, accumulation; and support 54-bit ALU, barrel shifter,
pipeline and bypass registers.
BSRAM
LittleBee® and Arora families BSRAM can be configured up to 18Kbits, and
both data bit width and depth can be configured. Each BSRAM has two
independent ports; and they have independent clocks, addresses, data, and
control signals, but they share the same storage memory. Each BSRAM has four
operation modes: Single Port, Dual Port, Semi Dual Port, and Read-only, and it
also supports pipeline and bypass registers.
SSRAM
LittleBee® and Arora families SSRAM can be configured as a single port random memory with depth of 16 and width of 1/2/4 bits, semi dual port random memory with a depth of 16 and width of 1/2/4 bits, and 16 bits x 1 read-only random access memory.
Lower Power Coding
Optimize an area to reduce logic usage and routing lengths, and then to reduce
power. It is recommended to use the IP Core Generator in Gowin Software to
call the basic cells in Gowin devices for the most
power-efficient (least area and resources) implementation. Eliminate known
glitches for power reasons, reduce I/O toggle rate, and enter into the sleep
state to reduce system power.
Coding to Avoid Simulation/Synthesis Mismatches
Certain coding styles can lead to synthesis result that differs from simulations. This is caused by error information that is ignored by a simulator but can be detected by a synthesis tool; as such, Gowin coding style is recommended.
Sensitivity Lists
Sensitivity lists must contain all input and output signals to avoid mismatches between simulation and synthesized logic.
Blocking/Nonblocking Assignments
Use blocking assignments to generate combinational logic; use nonblocking assignments to generate sequential logic.
Signal Fan-out
Signal fan-out control is designed to maintain reasonable post-synthesis fan- outs. The synthesis tool reduces signal fan-out by duplicating circuits. Use syn_maxfan for specific signals to acquire better timing closure. Gowin device architectures are designed to handle high fan-out clock signal with dedicated clock and handle high fan-out reset signal with dedicated global reset network. However, synthesis tools tend to replicate logic. This type of logic replication occupies more resources in the devices and makes performance checking more difficult. Use syn_maxfan flexibly based on actual conditions to avoid the generation of many logic replication.
Design Planning
FPGA design mainly includes the following two phases:
- Before designing, it is necessary to define the function and architecture; choose a suitable FPGA device and then write RTL code.
- Target your design to the chosen FPGA device and fully utilize the chosen device. Each of the two phases described above affects the other. This chapter focuses on the second phase and explains how you can fully utilize the functions and features provided by Gowin devices. chapter focuses on the second phase and explains how you can fully utilize the functions and features provided by Gowin devices.
Gowin Software Design Planning Flow
Design planning is not mandatory for all designs, but it will be beneficial to most designs, especially in the case of large designs that involve high resource utilization and/or a tight timing requirements. For these designs, design planning can help reduce potential placement and routing problems or timing issues, and it can increase the possibilities for design reuse and migration. In Gowin Software, design planning starts when the synthesis has been completed successfully and before placement and routing. CST files contain all the design planning constraints required to guide backend placement and routing. If design planning is modified, i.e. CST files are modified, the design returns to the stage after synthesis before placement and routing. For CST details, you can see SUG935, Gowin Design Physical Constraints User Guide.
Design Constraint Flow
Gowin Software allows you to set constraints for post-synthesis ports, netlist, registers, and instances. The CST file is the main input for defining design planning. Backend placement and routing software acquire users constraints by reading the CST files. If there is a syntax error or an illegal constraints error, the backend placement and routing software will exit directly. CST files can be text-edited or generated using FloorPlanner, as shown in Figure 3-1. For the details, you can see SUG935, Gowin Design Physical Constraints User Guide.
FloorPlanner Interface
Design Planning Tool
Gowin Software provides FloorPlanner for design planning, and its functions are as follows:
- View all the design elements that you can manage.
- View the hardware resources that are available for the chosen device.
- Assign specific design elements to specific FPGA resources.
- Support drag and drop to define constraints.
- Supports constraints legality check.
- Supports manual adjustment to optimize timing path.
Pinout
Pinout planning is the process of defining your FPGA I/O protocols and locations on the chosen device. It is based on your actual design and the chosen device.The pinout planning process involves the following steps:
- Assign your design ports to specific I/O locations.
- Define the I/O protocol and other I/O characteristics.
- Check your assignments for legal usage.
- Exchange your pin assignments with other parties according to PCB and circuit diagrams, if necessary.
Pinout Rules
- Distribute dedicated pins first, such as clock input, phase-locked loop input, DDR, etc.
- Distribute the common pins instead of specific pins to the specific BANK; this helps the backend routing tool to achieve an optimum result.
- Check if duplicated pins conflict with the device programming modes.
Pin Migration
For the device with the same package, you might want to migrate the design to a device with a larger capacity for further function extension or to a device with a smaller capacity to reduce cost. The pinout should stay the same or changes should be minimal to avoid PCB redesign. Gowin Software provides a pin migration feature. You can view incompatible pins in FloorPlanner. Use LOC_RESERVE to disable these pins.
Clock Assignment
Clock Assignment Rules
Clock frequency and clock fan-out are the main concerns when assigning clock resources assignment. The total number of available clock resources in Gowin FPGA devices is also a deciding factor.In general, dedicated resources give better timing results because of the minimized relative time delay. The routing resources saved also ease routing congestions in highly congested designs. In rare cases, you might use the general routing resources as a clock. Because general routing will increase the time delay, it should only be used in the designs that are characterized by low speed and low fanout. Here are the general rules for clock resource assignment:
- Determine the clock numbers and the fanout for each clock in your design.
- Determine the clock resources provided by the target device.
- Determine the speed requirement for each clock.
- Assign the high-speed and high-fanout clock as the global clock.
- If the number of global clocks is less than the clock number in your design, use quadrant clocks.
- Use high-speed clock resources for high-speed interfaces with high fanout.
Clock Resources Assignment Constraint
NET_LOC “xxx” BUFG0 = CLK | CE | SR | LOGIC Assign the clock signal to the dedicated global clock network; BUFG0~BUFG7 is the eight global clock supported by Gowin devices. CLK | CE | SR | LOGIC indicates the constraint object. CLK is clock; CE is clock enable; SR is synchronous reset, and LOGIC is a logic device, as shown in Figure 3-2.
Figure 3-2 Global Clock Constraints Interface
Logic Resource Constraint
Definition
The logic constraint is the logical partitioning of design elements, which results in physical placement or implementation changes of the design. Logic constraint is accomplished by specifying FPGA location preferences. For Gowin devices, logic constraint can improve the design performance, especially for the designs with good structures. Logic constraint provides a combination of automation and user control for design reuse and modular, hierarchical, and incremental design flows.
Constraint Syntax
INS_LOC “cnt[5]” R2C2 Specify the specific object to the specific CFU location. GROUP hh = { “cnt_Z[1]” “cnt_Z[2]” “cnt_Z[3]” “cnt_Z[4]” “cnt_Z[5]” “cnt_Z[6]” “cnt_Z[7]” }GRP_LOC hh R[3:6]C[4:6] Group specific objects and specify them to the precise regional location.
Constraint Strategies
- Define regions based on design hierarchy.
- Define regions based on critical paths.
- Define regions based on input/output signals with high fanout.
- Define logic modules and optimize modules individually using different enhancement strategies.
Special Considerations
- Block RAM can be placed alone. Do not group block RAMs.
- Larger logic groups need starting position and relative size.
- Do not group carry chains and bus.
- Do not group supplemental logic.
Timing Closure
Every design has to run at a certain speed based on the design requirement. There are generally three types of speed requirement in an FPGA design: timing, throughput, and latency. Throughput and latency are mutually exclusive. High throughput usually means more pipelining, which increases latency, while low latency usually requires longer combinatorial paths, which removes pipelines and can reduce the throughput. Therefore, there is a requirement to balance throughput and latency as a priority. Timing closure is usually the key factor that impacts the ability of a design to run at a required speed. It can be very challenging to close timing using various techniques. This chapter focuses on timing closure, and explains how to close timing in your design.
Timing Closure Strategies in Synthesis
When using GowinSynthesis for synthesis, you need to follow some general rules to get better timing closure.
- Obey Gowin RTL coding rules.
- Use proper constraints to guide synthesis.
- Use I/O pin registers to improve pin timing. Input pin register can be used to improve input setup time; output pin register can be used to improve clock to output time.
- Use I/O delay unit to improve input hold time. Input pin register use may result in hold time issues because of short data channel delay. IODELAY can therefore be added to the data path to compensate for the input hold time; for the details of primitives, you can see section 4.4 in UG289, Gowin Programmable IO User Guide.
- Adjust the relationship between IO input register data and clock using HCLK to meet both the setup time and hold time.
- Use dedicated GSR resources, i.e. instantiate Gowin primitive GSR in RTL design. If your design contains high fanout reset and set signals, it is advisable that you use GSR resources to reduce routing congestion and enhance routing efficiency.
- Reduce fanout to improve the main frequency. It is suggested to use syn_maxfan selectively to reduce fanouts at the expense of register duplication.
- Use one-hot state encoding. For high-speed design, it’s advisable to use one-hot state eccoding. Also it will increase resource utilization rate and power consumption.
- Resource reuse will increase logic levels and generate a huge delay path. The Synthesis tool usually reuses the non-key paths, but you also need to check the key paths to ensure that no timing issues are caused.
- Check the synthesis report; read the post-synthesis timing information simply; analyze high fan-out path, key path, and large delay path.
- Read the synthesis timing report carefully and thoroughly. As the synthesis timing report does not contain placement and routing information, the results outlined in the report are typically better than the actual results. The actual result is usually 1/3 to 1/2 less than the report result.
Timing Closure Strategies in Place & Route
When using Gowin Software, you need to follow some general rules to get better timing closure.
- Add appropriate physical and timing constraints to guide place & route.
- Pay attention to the warnings and errors displayed in the tool.
- Select appropriate place & route options.
- Avoid insufficient constraints, such as no clock constraints, asynchronous clock analysis, etc. Reasonable timing constraints can be added.
- Avoid excessive constraints, such as higher clock frequency than the actual required constraints, multi-cycle paths for IO, etc. multicycle or maxdelay can be used to slacken the timing.
- Use GCLK for clock enable whenever possible. Clock enable is usually a group of high fan-out signals for driving a large number of registers. If regular routing resources are used, which may results in a huge delay. It’s advisable that you use GCLK resources.
Timing Constraint
The Timing Constraints Editor in Gowin Software supports common timing constraints. You can see appendix A in SUG940, Gowin Timing Constraints User Guide for timing constraints syntax. If no clock constraint is added, Gowin Software will analyze the design timing according to the default clock frequency.
- The default clock for LittleBee® family is 50MHz.
- The default clock for Arora family is 100MHz.
- Setup timing is analyzed at high temperature and low pressure.
- Hold timing is analyzed at low temperature and high pressure.
Option Settings
- Run Timing Driven: Enable by default for timing driven routing; if timing requirements are not high, you can disable this option in order to save running time.
- Place option: Place algorithm with values of 0 and 1, and the default value is 0.
- If it is 0, the default place algorithm is used.
- If it is 1, some time efficiency is sacrificed to try to find a better place based on algorithm 0.
- Route Option: Route algorithm with values of 0, 1, and 2; and the default value is 0.
- When it is 0, the default route algorithm is used, and the route is adjusted according to congestion.
- When it is 1, the route is adjusted according to the timing.
- When it is 2, the route speed will be relatively fast.
- Place input register to IOB: Place registers driven by input Buffer to IOB to improve IO logic timing, and the default value is True.
- Place output register to IOB: Place registers driven by output/tristate Buffer to IOB to improve IO logic timing, and the default value is True.
- Place inout register to IOB: Place registers driven by in/out Buffer to IOB to improve IO logic timing, and the default value is True.
Solve Timing Issues
During the process of Gowin Software compilation, the timing analysis reports are output at the steps of synthesis and place & route respectively, and the timing requirements can be confirmed initially by the report. The flow of checking and solving timing issues is as shown in Figure 4-1.
Figure 4-1 Flow of Solving Timing Issues
Synthesis Timing Report Analysis
Before synthesis, it is a priority to check that the RTL conforms to Gowin coding guideline; for example, whether the DSP/BSRAM outputs go through registers; how the state machine is coded, etc. After synthesis, read the timing analysis results in the synthesis report.
-
Check whether the maximum operating frequency meets the design frequency requirements by the “Max Frequency Summary”, as shown in Figure 4-2.
Figure 4-2 Max. Frequency Summary -
Confirm the estimated timing of the worst path by “Detail Timing Paths Information”.
- If the number of logic levels on the worst path is large, registers need to be inserted to reduce the number of logic levels. For C6 speed chips of LittleBee® family, if run up to 100MHz, the number of logic levels needs to be controlled to less than 4; for C8 speed chips of Arora family, if run up to 100MHz, the number of logic levels needs to be controlled to less than 8.
- If the main delay on the worst path is caused by BSRAM, the BSRAM output can be delayed by one clock cycle.
- If the main delay on the worst path is caused by DSP, the DSP output can be delayed by one clock cycle.
- If the delay on the worst path is primarily caused by SSRAM, SSRAM can be converted to a shift register, using the synthesis attribute syn_srlstyle.
Place & Route Report Analysis
By analyzing the resource usage in place and route report, ensure that design resources are properly utilized. If there is excessive utilization, the RTL design needs to be updated, or synthesis attribute constraints need to be added.
Resource Usage Summary
Before analyzing the timing report, the place & route report needs to be
reviewed to confirm that the resource utilization is not excessive. In
general, when the CLS utilization reaches 85% or more, or BSRAM/DSP resource
utilization exceeds 80%, which can be called an excessive design; and it can
lead to timing closure issues, as shown in Figure 4-3.
Figure 4-3 Resource Usage Summary
Clock Usage Summary
In theory, the number of clocks in the design should not exceed that of clocks
of the target device. Otherwise, some general routing resources are used as
clocks, which may cause setup or hold violations. You can view the clock
resource usage in the placement and routing report, as shown in Figure 4-4.
Figure 4-4 Global Clock Usage Summary
Place & Route Timing Report Analysis
Reduce Register Fan-out
Read the worst path of the “Setup Analysis Report” in the timing report, and check the number of fan-outs of the starting register of the timing path. If the fan-out is too large, you can duplicate registers to reduce the fan-out of registers, as shown in Figure 4-5.
Figure 4-5 Worst Timing Path
In the above figure, the number of fan-outs of the starting register of the worst path is 21, and the net delay of 1.609ns occupies 42% of the path delay, which seriously affects the timing closure; in the RTL, add attributes to the net definition of this register, such as: reg mov_sig_src_o_s0 / synthesis syn_maxfan = 10 / Therefore, the RTL should be designed according to the actual fan-out of the register, but it is not that the smaller the fan-out, the better, which may also lead to high fan-out of register source; and it can affect the timing closure. Optimize BUFS Resource Usage Generally speaking, the clock enable CE is a high fan-out net, and the BUFS routing resources are preferentially used in the placement and routing. If the delay on the worst timing path is mainly caused by CE using BUFS routing resources, you can add physical constraints to avoid using BUFS, such as: CLOCK_LOC “ce_0” LOCAL_CLOC
References
Read User Manual Online (PDF format)
Read User Manual Online (PDF format) >>