223 lines
20 KiB
Markdown
223 lines
20 KiB
Markdown
# FPGA Smart Network Interface Card
|
|
|
|
See separate document for [installation instructions](../DEPLOYMENT.md).
|
|
|
|
## Hardware
|
|
Currently supported FPGA is only **Xilinx Alveo U55C**.
|
|
|
|
See AMD/Xilinx webpage for [card user guide (UG1469)](https://docs.xilinx.com/r/en-US/ug1469-alveo-u55c).
|
|
According to the user guide:
|
|
```
|
|
Alveo data center accelerator cards are designed to be installed into a data center server, where controlled air flow provides direct cooling.
|
|
```
|
|
|
|
Card needs to be placed in PCI Express (PCIe) Gen4 x8 slot, though mechanically slot has to accommodate x16 card.
|
|
There is no need to connect additional power cable, as power of the card is not exceeding 75 W load available from PCIe edge connector.
|
|
Current power estimation is about 30 W when idle and 45 W in operation. The card has built-in protection, which will cut power to the card if HBM temperature is above 120°C.
|
|
|
|
Two variants of the card are available:
|
|
* `100g` - this variant operates one port in 100 Gbit/s mode and should be used when connecting detector via a switch.
|
|
* `8x10g` - this variant operates both QSFP ports at 4x10 Gbit/s. QSFP+ (40 Gbit/s) transceivers and MTO/MTP harness cables
|
|
are necessary. It is designed for detector directly connected to the Jungfraujoch server, without switch.
|
|
|
|
See [network documentation](NETWORK.md) for details of network.
|
|
|
|
## Content of directories
|
|
|
|
CPU Part:
|
|
|
|
* `pcie_driver` Linux kernel driver for PCIe version of the FPGA board - see [instructions](pcie_driver/README.md)
|
|
* `host_library` Library that should be used to access the driver + some simple diagnostic tools - see [workflow documentation](pcie_driver/README.md)
|
|
|
|
FPGA part:
|
|
|
|
* `scripts` Scripts for FPGA synthesis
|
|
* `xdc` Constraints for FPGA
|
|
* `hdl` FPGA design parts developed in Verilog
|
|
* `hls` FPGA design parts developed in C++ with high-level synthesis
|
|
|
|
Dependencies:
|
|
|
|
* `include` External (Xilinx) headers for high-level synthesis code
|
|
|
|
## Building firmware
|
|
Xilinx Vivado version has to precisely match version described in [the system requirements](../README.md.
|
|
only when `vivado` and `vitis_hls` are detected in the path.
|
|
|
|
### Xilinx Vivado
|
|
The following procedures require having AMD (Xilinx) Vivado and Vitis HLS toolsets version **2022.1** installed on the machine.
|
|
Due to the nature of TCL scripts used to generate board designs Vivado version has to exactly match one provided above -
|
|
specifically newer versions of Vivado will not work.
|
|
|
|
In additional to Intellectual Property (IP) cores included in Vivado, two additional licenses are necessary:
|
|
* Non-cost license for Ultrascale+ 100G core has to be requested from AMD/Xilinx website, see [Xilinx website](https://www.xilinx.com/products/intellectual-property/cmac_usplus.html), to build `100g` design.
|
|
* Paid 10G/25G Subsystem for Ultrascale+ to build `8x10g` design.
|
|
PSI received non-cost licenses from Xilinx University Program for the latter cores. Therefore, usage of bitstreams
|
|
generated by PSI continuous integration pipeline for `8x10g` is only allowed for non-commercial use.
|
|
### HLS compilation
|
|
Make HLS routines:
|
|
```
|
|
mkdir build
|
|
cd build
|
|
cmake ..
|
|
make hls
|
|
```
|
|
|
|
### Synthesis
|
|
Create PCIe `100g` bitstream with the following command:
|
|
```
|
|
mkdir build
|
|
cd build
|
|
cmake ..
|
|
make pcie_100g
|
|
```
|
|
and `8x10g`:
|
|
```
|
|
mkdir build
|
|
cd build
|
|
cmake ..
|
|
make pcie_8x10g
|
|
```
|
|
### When Vivado is not present
|
|
|
|
During CMake execution, the following executables: `vivado` and `vitis_hls` must be present in the path.
|
|
If not, build targets will not be generated, and such or similar error message will show up:
|
|
```
|
|
$ make pcie_100g
|
|
make: *** No rule to make target 'pcie_100g'. Stop.
|
|
```
|
|
|
|
### Gitlab CI
|
|
If Gitlab CI is properly set-up, firmware will be automatically built for every commit that modifies FPGA source files.
|
|
Built firmware should be downloaded as MCS files.
|
|
|
|
## FPGA reference
|
|
|
|
### Frame generator
|
|
|
|
Jungfraujoch card is equipped with frame generator. It allows to simulate JUNGFRAU detector without having access to such system.
|
|
It is placed in parallel to Ethernet MAC - so it is placed before the network stack and before any processing happening on the card.
|
|
In the future a redirection will be possible to send the simulated stream through the 100G TX network link.
|
|
Frame generator is written in HLS and controlled with AXI-Lite.
|
|
|
|
### Register map
|
|
FPGA setup can be done via registers:
|
|
|
|
| Address | Bits | Meaning | Mode | Notes |
|
|
|---------------------|------|------------------------------------------------------------------------------------------------|:-----|----------------------------------------------|
|
|
| 0x000000 - 0x00FFFF | | Reserved (in case using MicroBlaze in the future, this has to be reserved for internal memory) | | |
|
|
| 0x010000 | 32 | Action Control Register | | |
|
|
| | | Bit 0 - Action start | R/W | |
|
|
| | | Bit 1 - Action idle | R | |
|
|
| | | Bit 2 - Action cancel | R/W | cleared on reset or action start |
|
|
| | | Bit 3 - Clear network counters | R/W | cleared on reset |
|
|
| | | Bit 12:4 - Debug signals (see [hdl/action_config.v](hdl/action_config.v)) | R | |
|
|
| | | Bit 16 - AXI Mailbox interrupt 0 | R | |
|
|
| 0x010004 | 32 | Reserved | - | |
|
|
| 0x010008 | 32 | Reserved | - | |
|
|
| 0x01000C | 32 | GIT SHA1 | R | |
|
|
| 0x010010 | 32 | Reserved | R | |
|
|
| 0x010014 | 32 | Reserved | R | |
|
|
| 0x010018 | 32 | Jungfraujoch FPGA variant | R | |
|
|
| 0x01001C | 32 | Reserved | R | |
|
|
| 0x010020 | 32 | Max. number supported detector modules | R | constant |
|
|
| 0x010024 | 32 | Reserved | R | constant |
|
|
| 0x010028 | 64 | Pipeline stalls before writing to host memory | R | reset on action start |
|
|
| 0x010030 | 64 | Pipeline stalls before accessing HBM | R | reset on action start |
|
|
| 0x010038 | 32 | FIFO status (see action_config.v for details) | R | |
|
|
| 0x01003C | 32 | Size of single HBM channel in bytes (default value for the particular card) | R/W | should not be altered for standard operation |
|
|
| 0x010040 | 64 | Packets processed by the action | R | cleared on reset or action start |
|
|
| 0x010048 | 64 | Valid ethernet packets | R | cleared on reset |
|
|
| 0x010050 | 64 | Valid ICMP packets | R | cleared on reset |
|
|
| 0x010058 | 64 | Valid UDP packets | R | cleared on reset |
|
|
| 0x010060 | 64 | Valid detector packets processed by the card | R | cleared on reset |
|
|
| 0x010068 | 64 | Packets flagged as errors by CMAC | R | cleared on reset |
|
|
| 0x010070 | 64 | Pipeline stalls before data processing | R | reset on action start |
|
|
| 0x010078 | 64 | AXI-beats before accessing HBM | R | reset on action start |
|
|
| 0x010080 | 64 | AXI-beats before data processing | R | reset on action start |
|
|
| 0x010088 | 64 | AXI-beats before host writer | R | reset on action start |
|
|
| 0x010090 | 64 | Last encountered SwissFEL pulse ID | R | cleared on reset |
|
|
| 0x010100 | 32 | Spot finder photon count threshold | R/W | |
|
|
| 0x010104 | 32 | Spot finder signal-to-noise ratio threshold (single-precision float) | R/W | |
|
|
| 0x010200 | 64 | MAC address source for internal frame generator | R/W | network byte order |
|
|
| 0x010208 | 32 | IPv4 address source for internal frame generator | R/W | network byte order |
|
|
| 0x01020C | 32 | Number of detector modules (value minus one: 0 => 1 module, 1 => 2 modules, etc.) | R/W | |
|
|
| 0x010210 | 32 | Data collection mode | R/W | |
|
|
| | | Bit 0 - Conversion to photons | | |
|
|
| | | Bit 1 - Output extend to 32-bit | | |
|
|
| | | Bit 2 - Output is unsigned integer | | |
|
|
| | | Bit 3 - Use sq. root lossy compression | | |
|
|
| | | Bit 7 - JUNGFRAU fixed G1 mode | | |
|
|
| | | Bit 8 - Set to zero values below threshold | | |
|
|
| | | Bit 16:31 - Data collection ID (carried with completions) | | |
|
|
| 0x010214 | 32 | Photon energy in keV (single-precision float) | R/W | |
|
|
| 0x010218 | 32 | Number of frames expected in the data collection (defines termination condition) | R/W | |
|
|
| 0x01021C | 32 | Number of storage cells | R/W | |
|
|
| 0x010220 | 32 | Summation on card (value minus one: 0 => summation of 1, 1 => summation of 2, etc.) | R/W | |
|
|
| 0x010224 | 32 | Coefficient for sq. root compression (need to set bit in data collection mode to apply) | R/W | |
|
|
| 0x010225 | 32 | Threshold; set values below set to zero (need to set bit in data collection mode to apply) | R/W | |
|
|
| 0x030000 - 0x03FFFF | | AXI Mailbox for Work Request / Work Completion | | See Xilinx PG114 for register map |
|
|
| 0x040000 - 0x04FFFF | | QuadSPI flash | | See Xilinx PG153 for register map |
|
|
| 0x050000 - 0x05FFFF | | Interrupt controller | | See Xilinx PG099 for register map |
|
|
| 0x060000 - 0x06FFFF | | Load calibration (HLS) | | |
|
|
| 0x070000 - 0x07FFFF | | AXI Firewall | | See Xilinx PG293 for register map |
|
|
| 0x080000 - 0x08FFFF | | Frame generator (HLS) | | |
|
|
| 0x090000 - 0x09FFFF | | PCIe DMA control | | See Xilinx PG195 for register map |
|
|
| 0x0A0000 - 0x0AFFFF | | I2C clock generator | | See Xilinx PG195 for register map |
|
|
| 0x0C0000 - 0x0FFFFF | | Xilinx Card Management Solution Subsystem management subsystem | | See Xilinx PG348 for register map |
|
|
| 0x100000 - 0x10FFFF | | MAC 10G / CMAC 100G | | See Xilinx PG210/PG203 for register map |
|
|
| 0x110000 - 0x11FFFF | | MAC 10G | | See Xilinx PG210 for register map |
|
|
| 0x120000 - 0x12FFFF | | MAC 10G | | See Xilinx PG210 for register map |
|
|
| 0x130000 - 0x13FFFF | | MAC 10G | | See Xilinx PG210 for register map |
|
|
| 0x140000 - 0x14FFFF | | MAC 10G | | See Xilinx PG210 for register map |
|
|
| 0x150000 - 0x15FFFF | | MAC 10G | | See Xilinx PG210 for register map |
|
|
| 0x160000 - 0x16FFFF | | MAC 10G | | See Xilinx PG210 for register map |
|
|
| 0x170000 - 0x17FFFF | | MAC 10G | | See Xilinx PG210 for register map |
|
|
| 0x200000 - 0x20FFFF | | Eth/IPv4 network stack for interface #0 | | |
|
|
| 0x210000 - 0x21FFFF | | Eth/IPv4 network stack for interface #1 | | |
|
|
| 0x220000 - 0x22FFFF | | Eth/IPv4 network stack for interface #2 | | |
|
|
| 0x230000 - 0x23FFFF | | Eth/IPv4 network stack for interface #3 | | |
|
|
| 0x240000 - 0x24FFFF | | Eth/IPv4 network stack for interface #4 | | |
|
|
| 0x250000 - 0x25FFFF | | Eth/IPv4 network stack for interface #5 | | |
|
|
| 0x260000 - 0x26FFFF | | Eth/IPv4 network stack for interface #6 | | |
|
|
| 0x270000 - 0x27FFFF | | Eth/IPv4 network stack for interface #7 | | |
|
|
| 0x400000 - 0x47FFFF | 64 | Address table: decodes handles used by load_calibration and host_writer to DMA addresses | | |
|
|
|
|
### AXI Mailbox
|
|
|
|
AXI mailbox is used to send work request from host to action, and receive work completions.
|
|
Messages are exchanged through AXI Mailbox IP from Xilinx (see Xilinx PG114).
|
|
|
|
Work request has the following structure:
|
|
|
|
| Bit start | Bit end | Meaning |
|
|
|-----------|---------|----------------------------------------------------|
|
|
| 0 | 15 | Work request ID (handle) |
|
|
|
|
Work completion has the following structure:
|
|
|
|
| Bit start | Bit end | Meaning |
|
|
|-----------|---------|----------------------------------|
|
|
| 0 | 15 | Work request ID (handle) |
|
|
| | | Special values: |
|
|
| | | 65534 - start of data collection |
|
|
| | | 65535 - end of data collection |
|
|
| 15 | 31 | Data collection ID |
|
|
|
|
### HBM memory
|
|
|
|
| Interface number | Core | Meaning |
|
|
|------------------|------------------|------------------------|
|
|
| 0-1 | jf_conversion | Gain factor G0 |
|
|
| 2-3 | jf_conversion | Gain factor G1 |
|
|
| 4-5 | jf_conversion | Gain factor G2 |
|
|
| 6-7 | jf_conversion | Pedestal G0 |
|
|
| 8-9 | jf_conversion | Pedestal G1 |
|
|
| 10-11 | jf_conversion | Pedestal G2 |
|
|
| 12-13 | integration | Integration map |
|
|
| 14-15 | integration | Integration weights |
|
|
| 16-17 | spot_finder_mask | Spot finder resolution |
|
|
| 18-19 | roi_calc | ROI calculation |
|
|
| 20-21 | frame_generator | Frame generator |
|
|
| 22-23 | load_from_hbm | Frame summation |
|
|
| 24-25 | load_from_hbm | Frame summation | |