FPGA内部构成

1.3 PLD/FPGA的结构

1.3.1 FPGA基本功能

FPGA结构

组成数字电路的三个基本部分为:门、寄存器以及将这些门(Gate)、寄存器(Register)连接起来的连线(Wire)。

FPGA逻辑单元构成举例

1.3.2 FPGA增强功能
  1. 内置处理器:软核 & 硬核 & DSP
  2. 时钟及管理:PLLDLL、驱动/分配
  3. IO:多种高速收发、DDR存储器访问、可编程数控阻抗
  4. 嵌入MAC单元 - 高效浮点运算
  5. 各种内置存储器:双口RAMFIFO
  6. 各种常用接口:I2CSPI
  7. 系统监控:内置ADCs

逻辑单元(Logic-cells)

FPGAs are built from one basic “logic-cell”, duplicated hundreds or thousands of time. A logic-cell is basically a small lookup table (“LUT”), a D flip-flop and a 2-to-1 mux (to bypass the flip-flop if desired).

The LUT can implement any logic function. It has typically a few inputs (4 in the drawing above), so for example an AND gate with 3 inputs, whose result is then OR-ed with another input would fit in one 4-input LUT.

互连

Each logic-cell can be connected to other logic-cells through interconnect resources (wires/muxes placed around the logic-cells). Each cell can do little, but with lots of them connected together, complex logic functions can be created.

输入输出单元

The interconnect wires also go to the boundary of the device where I/O cells are implemented and connected to the pins of the FPGAs.

特定的布线(routing/carry chains)

In addition to general-purpose interconnect resources, FPGAs have fast dedicated lines in between neighboring logic cells. The most common type of fast dedicated lines are “carry chains”. Carry chains allow creating arithmetic functions (like counters and adders) efficiently (low logic usage & high operating speed). For more info, check this page.

Older programmable technologies (PAL/CPLD) don't have carry chains and so are quickly limited when arithmetic operations are required.

In addition to logic, all new FPGAs have dedicated blocks of static RAM distributed among and controlled by the logic elements.

内部RAM工作模式

There are many parameters affecting RAM operation. The main parameter is the number of agents that can access the RAM simultaneously.

“single-port” RAMs: only one agent can read/write the RAM. “dual-port” or “quad-port” RAMs: 2 or 4 agents can read/write. Great to get data across clock domains (each agent can use a different clock). Here's a simplified drawing of a dual-port RAM.

To figure out how many agents are available, count the number of separate address buses going to the RAM. Each agent has a dedicated address bus. Each agent has also a read and/or a write data bus.

Writing to the RAM is usually done synchronously. Reading is usually done synchronously but can sometimes be done asynchronously.

Blockram vs. Distributed RAM Now there are two types of internal RAMs in an FPGA: blockrams and distributed RAMs. The size of the RAM needed usually determines which type is used.

The big RAM blocks are blockrams, which are located in dedicated areas in the FPGA. Each FPGA has a limited number of these, and if you don't use them, you “loose” them (they cannot be used for anything but RAM). The small RAM blocks are either in smaller blockrams (Altera does that), or in “distributed RAM” (Xilinx does that). Distributed RAM allows using the FPGA logic-cells as tiny RAMs which provides a very flexible RAM distribution in an FPGA, but isn't efficient in term of area (a logic-cell can actually hold very few bits of RAM). Altera prefers building different size blockrams around the device (more area efficient, but less flexible). Which one is better for you depends on your FPGA application.

FPGA管脚分配

FPGAs tend to have lots of pins… So to make it a little simpler, let's put them into two bins: “user pins” and “dedicated pins”.

用户管脚

The user pins are called “IOs”, or “I/Os”, or “user I/Os”, or “user IOs”, or “IO pins”, or … you get the idea. IO stands for “input-output”.

  • You usually have total control over user IOs. They can be programmed to be inputs, outputs, or bi-directional (i.e. with tri-statable buffers).
  • Each IO pin is connected to an “IO cell” inside the FPGA. The “IO cells” are powered by the VCCIO pins (IO power pins) - more details below.

固定管脚

The “dedicated pins” are hard-coded to a specific function. They fall into the three following sub-categories.

  • 电源管脚
  • 配制管脚: used to “download” the FPGA.
  • 固定输入或时钟管脚: these are able to drive large nets inside the FPGA, suitable for clocks or signals with large fan-outs.

The power pins fall into two categories: “core voltage” and “IO voltage”.

  • The core voltage is named “VCC” for Xilinx and “VCCINT” for Altera. It is fixed (set by the model of FPGA that you are using). It is used to power the logic gates and flip-flops inside the FPGA. The voltage was 5V for older FPGA generations, and is coming down as new generations come (3.3V, 2.5V, 1.8V, 1.5V, 1.2V and even lower for the latest devices).
  • The IO voltage is named “VCCO” for Xilinx and “VCCIO” for Altera. It is used to power the I/O blocks (= pins) of the FPGA. That voltage should match what the other devices connected to the FPGA expect.

An FPGA has many VCCIO pins that may be all powered by the same voltage. But new generations of FPGAs have a concept of “user IO banks”: the IOs are split into groups, each having its own VCCIO pins. That allows using the FPGA as a voltage translator device, useful for example if one part of your board works with 3.3V logic, and another with 2.5V.

FPGA时钟的使用和处理

An FPGA design is usually “synchronous”. Simply put, that means that the design is clock based and each clock rising edge allows all the D flip-flops to simultaneously take a new state.

In a synchronous design, a single clock may drive a lot of flip-flops. That can cause timing and electrical problems inside the FPGA. To get that working properly, FPGA manufacturers provide special internal wires called “global routing” or “global lines”. They allow distributing the clock signal all over the FPGA with a low skew (i.e. the clock signal appears almost simultaneously to all the flip-flops).

Most FPGA designs use at least one clock that is generated outside the FPGA and then fed to the FPGA through one pin. Just make sure you use a clock pin (only them have the ability to drive global lines).

Clock domains An FPGA can use multiple clocks (using multiple global lines and clock pins). Each clock forms a “clock domain” inside the FPGA.

Flip-flops and combinatorial logic in each clock domain For each flip-flop inside the FPGA, its clock domain is easy to determine. Just look at the flip-flop clock input. But what about the combinatorial logic that sits in between flip-flops?

  • If there is some combinatorial logic in between “same clock domain” flip-flops, the logic is said to be part of the clock domain too.
  • If there is some combinatorial logic in between “different clock domains” flip-flops, the logic is not owned by any clock domain. But in a typical FPGA design, there is no such logic; the only paths from different clock domains are synchronizers.

Clock domain speeds For each clock domain, the FPGA software will analyze all flop-to-flop paths and give you a report with the maximum allowed frequencies. In the general case, only the paths from within each clock domains are analyzed. The synchronizer paths (from different clock domains) usually don't matter and are not analyzed.

One clock domain may work at 10MHz, while another may work at 100MHz. As long as each clock uses a global line, and you use clock speeds that are lower than the maximums reported by the software, you don't have to worry about internal timing issues, the design is guaranteed to work internally timing-wise.

There may still be some timing issues from the FPGA input and output pins though. The software will give you a report about that. See also the next section.

Signals between clock domains If you need to send some information across different clock domains, special considerations need to apply.

In the general case, if your clocks have no relationship with one another, you cannot use a signal generated from one clock domain into another as-is. Doing so would violate setup and hold flip-flop timings (in the destination clock domain), and cause metastability.

Crossing clock domains requires special techniques, like the use of synchronizers (that's simple), or FIFOs (that's more complicated). See the Crossing clock domains project to get some practical ideas, plus Interfacing Two Clock Domains and What Is Metastability?.

FPGA的配置和调试

FPGA vendors provide many ways to “configure” (i.e. download) their devices. One way uses a cable that connects your PC to the FPGA board. These cables are usually called “JTAG cables” (because they can connect to the JTAG pins of the FPGA).

FPGA cables are vendor specific The FPGA configuration interface from all the FPGA vendors are very much alike. That doesn't prevent each vendor to have their own proprietary connectors and cables.

Xilinx cables

  • The most popular one is the Platform Cable USB II
  • Xilinx parallel cable is called Parallel cable III

Altera cables

  • The most popular one is the USB-Blaster
  • Altera parallel cable are the ByteBlasterMV and ByteBlaster II

Parallel cables Parallel cables connect to PCs parallel (printer) ports. They are less popular than USB cables but can still be interesting due to their simplicity. They buffer a few pins of the PC parallel interface, and connect to the target board using a flat cable or flying leads. Parallel cables are active devices and need power, but they are usually powered from the target FPGA board.

FPGA vendors sometime provide the schematic of the cables, which is valuable if you want to understand how they work or build a cable yourself.

An FPGA can be into 2 states: “configuration mode” or “user mode”. When an FPGA wakes up after power-up, it is in configuration mode, sitting idle with all its outputs inactive. You need to configure it.

Configuring an FPGA means downloading a stream of 0's and 1's through some special pins. Once the FPGA is configured, it goes into “user-mode” and becomes active.

There are three classical ways to configure your FPGA:

  • You use a “JTAG cable” from your PC to the FPGA, and run a software on your PC to send data through the cable.
  • You use a microcontroller on your board, with an adequate firmware to send data to the FPGA.
  • You use a “boot-PROM” on your board, connected to the FPGA, that configures the FPGA automatically at power-up (FPGA vendors have such special boot-PROMs in their catalogs).

During development, the first method is the easiest and quickest. Once your FPGA design works, you probably don't need the PC anymore, so the other two methods come into use.

Configuration works in surprisingly identical ways between Xilinx and Altera devices. The differences are mostly in the naming (pin names and modes of operation are named differently), but the functionality is similar.

Most FPGAs have two sets of pins dedicated to configuration:

  • The JTAG interface.
  • The “synchronous serial” interface.

FPGA configuration can quickly become a complex subject, so you might want to skip this section, especially if you intend to use an already-made FPGA development board. Development boards usually come with a JTAG cable, or a special cable that you can use with no knowledge of the underlying interface. But if you want to learn a little more, read-on.

The JTAG interface (or JTAG “port”) FPGAs have the ability to be configured through JTAG (using proprietary JTAG commands). Note that JTAG was originally designed for test and manufacturing purposes (to allow a computer to take control of the device pins). FPGAs are certainly JTAG-testing able too.

See here for more info.

The “synchronous serial” interface It is a simple data/clock interface. It is synchronous and you usually provide one bit at a time to the FPGA.

Here's a description of the five most important pins of this interface:

Xilinx pin name Altera pin name Direction Pin function
data data0 FPGA input configuration data bit
clk dclk FPGA input configuration clock (the configuration data bit is shifted in the FPGA at the clock rising-edge)
progb ^nConfig ^FPGA input ^When asserted (i.e. when it goes low - this is an active low pin), the FPGA is reset-ed and looses its configuration. If the FPGA was in user-mode, it stops operation immediately, and all IOs become inactive.^ ^initb nStatus FPGA output This pin indicates when the FPGA is ready to start the configuration process (it takes a few milliseconds for the FPGA to get ready).
done ConfDone FPGA output When high, indicates that the FPGA is configured (i.e. in user-mode).

Note: the init_b and done pins are actually open-collector pins, so pull-up resistors are required on these. Also if multiple FPGAs are to be configured, these pins are usually connected together so that all the FPGAs switch into “user-mode” together. There is many more details, so for a complete description, check your FPGA datasheet.

2. FPGA是如何工作的?

FPGA采用了逻辑单元阵列LCA(Logic Cell Array)这样一个概念,内部包括可配置逻辑模块CLB(Configurable Logic Block)、输入输出模块IOB(Input Output Block)和内部连线(Interconnect)三个部分。 现场可编程门阵列(FPGA)是可编程器件,与传统逻辑电路和门阵列(如PAL,GAL及CPLD器件)相比,FPGA具有不同的结构。FPGA利用小型查找表(16×1RAM)来实现组合逻辑,每个查找表连接到一个D触发器的输入端,触发器再来驱动其他逻辑电路或驱动I/O,由此构成了既可实现组合逻辑功能又可实现时序逻辑功能的基本逻辑单元模块,这些模块间利用金属连线互相连接或连接到I/O模块。FPGA的逻辑是通过向内部静态存储单元加载编程数据来实现的,存储在存储器单元中的值决定了逻辑单元的逻辑功能以及各模块之间或模块与I/O间的联接方式,并最终决定了FPGA所能实现的功能,FPGA允许无限次的编程。

主流的FPGA仍是基于查找表技术的,已经远远超出了先前版本的基本性能,并且整合了常用功能(如RAM、时钟管理和DSP)的硬核(ASIC型)模块。FPGA芯片主要由7部分组成:可编程输入输出单元、基本可编程逻辑单元、完整的时钟管理、嵌入块式RAM、丰富的布线资源、内嵌的底层功能单元和内嵌专用硬件模块。

3. FPGA内部存储器

4. FPGA的管脚

5. FPGA的时钟及全局信号线

6. FPGA 供电

FPGA的供电电源要求电压范围从1.2V到5V,供电电流范围从数十毫安到数安培。可用三种电源:低压差(LDO)线性稳压器、开关式DC-DC稳压器和开关式电源模块。最终选择何种电源取决于系统、系统预算和上市时间要求:

  • 如果电路板空间是首要考虑因素,低输出噪声十分重要,或者系统要求对输入电压变化和负载瞬变做出快速响应,则应使用LDO稳压器。LDO功效比较低(因为是线性稳压器),只能提供中低输出电流。输入电容通常可以降低LDO输入端的电感和噪声。LDO输出端也需要电容,用来处理系统瞬变,并保持系统稳定性。也可以使用双输出LDO,同时为VCCINT和VCCO供电。
  • 如果在设计中效率至关重要,并且系统要求高输出电流,则开关式稳压器占优势。开关电源的功效比高于LDO,但其开关电路会增加输出噪声。与LDO不同,开关式稳压器需利用电感来实现DC-DC转换。

要求

为确保正确上电,内核电压VCCINT的缓升时间必须在制造商规定的范围内。对于一些FPGA,由于VCCINT会在晶体管阈值导通前停留更多时间,因此过长的缓升时间可能会导致启动电流持续较长时间。如果电源向FPGA提供大电流,则较长的上电缓升时间会引起热应力。ADI公司的DC-DC稳压器提供可调软启动,缓升时间可以通过外部电容进行控制。缓升时间典型值在20ms至100ms范围内。

许多FPGA没有时序控制要求,因此VCCINT、VCCO和VCCAUX可以同时上电。如果这一点无法实现,上电电流可以稍高。时序要求依具体FPGA而异。对于一些FPGA,必须同时给VCCINT和VCCO供电。对于另一些FPGA,这些电源可按任何顺序接通。多数情况下,先给VCCINT后给VCCO供电是一种较好的做法。 当VCCINT在0.6V至0.8V范围内时,某些FPGA系列会产生上电涌入电流。在此期间,电源转换器持续供电。这种应用中,因为器件需通过降低输出电压来限制电流,所以不推荐使用返送电流限制。但在限流电源解决方案中,一旦限流电源所供电的电路电流超过设定的额定电流,电源就会将该电流限制在额定值以下。

配电结构

对于高速、高密度FPGA器件,保持良好的信号完整性对于实现可靠、可重复的设计十分关键。适当的电源旁路和去耦可以改善整体信号完整性。如果去耦不充分,逻辑转换将会影响电源和地电压,导致器件工作不正常。此外,采用分布式电源结构也是一种主要解决方案,给FPGA供电时可以将电源电压偏移降至最低。 在传统电源结构中,AC/DC或DC/DC转换器位于一个地方,并提供多个输出电压,在整个系统内分配。这种设计称为集中式电源结构(CPA)。以高电流分配低电压时,铜线或PCB轨道会产生严重的电阻损耗,CPA就会发生问题。

CPA的替代方案是分布式电源结构(DPA)。采用DPA时,整个系统内仅分配一个半稳压的DC电压,各DC/DC转换器(线性或开关式)与各负载相邻。DPA中,DC/DC转换器与负载(例如FPGA)之间的距离近得多,因而线路电阻和配线电感引起的电压下降得以减小。这种为负载提供本地电源的方法称为负载点(POL)。

7. FPGA的配置模式

FPGA有多种配置模式:并行主模式为一片FPGA加一片EPROM的方式;主从模式可以支持一片PROM编程多片FPGA;串行模式可以采用串行PROM编程FPGA;外设模式可以将FPGA作为微处理器的外设,由微处理器对其编程。 如何实现快速的时序收敛、降低功耗和成本、优化时钟管理并降低FPGA与PCB并行设计的复杂性等问题,一直是采用FPGA的系统设计工程师需要考虑的关键问题。如今,随着FPGA向更高密度、更大容量、更低功耗和集成更多IP的方向发展,系统设计工程师在从这些优异性能获益的同时,不得不面对由于FPGA前所未有的性能和能力水平而带来的新的设计挑战。 例如,领先FPGA厂商Xilinx推出的Virtex-5系列采用65nm工艺,可提供高达33万个逻辑单元、1,200个I/O和大量硬IP块。超大容量和密度使复杂的布线变得更加不可预测,由此带来更严重的时序收敛问题。此外,针对不同应用而集成的更多数量的逻辑功能、DSP、嵌入式处理和接口模块,也让时钟管理和电压分配问题变得更加困难。 幸运地是,FPGA厂商、EDA工具供应商正在通力合作解决65nm FPGA独特的设计挑战。不久以前,Synplicity与Xilinx宣布成立超大容量时序收敛联合工作小组,旨在最大程度地帮助系统设计工程师以更快、更高效的方式应用65nm FPGA器件。设计软件供应商Magma推出的综合工具Blast FPGA能帮助建立优化的布局,加快时序的收敛。 FPGA的配置方式已经多元化!