FPGA Smart Network Interface Card
See separate document for installation instructions.
Hardware
Currently supported FPGA is only Xilinx Alveo U55C.
See AMD/Xilinx webpage for card user guide (UG1469). According to the user guide:
Alveo data center accelerator cards are designed to be installed into a data center server, where controlled air flow provides direct cooling.
Card needs to be placed in PCI Express (PCIe) Gen4 x8 slot, though mechanically slot has to accommodate x16 card. There is no need to connect additional power cable, as power of the card is not exceeding 75 W load available from PCIe edge connector. Current power estimation is about 30 W when idle and 45 W in operation. The card has built-in protection, which will cut power to the card if HBM temperature is above 120°C.
Two variants of the card are available:
100g- this variant operates one port in 100 Gbit/s mode and should be used when connecting detector via a switch.8x10g- this variant operates both QSFP ports at 4x10 Gbit/s. QSFP+ (40 Gbit/s) transceivers and MTO/MTP harness cables are necessary. It is designed for detector directly connected to the Jungfraujoch server, without switch.
See network documentation for details of network.
Content of directories
CPU Part:
pcie_driverLinux kernel driver for PCIe version of the FPGA board - see instructionshost_libraryLibrary that should be used to access the driver + some simple diagnostic tools - see workflow documentation
FPGA part:
scriptsScripts for FPGA synthesisxdcConstraints for FPGAhdlFPGA design parts developed in VeriloghlsFPGA design parts developed in C++ with high-level synthesis
Dependencies:
includeExternal (Xilinx) headers for high-level synthesis code
Building firmware
Xilinx Vivado version has to precisely match version described in [the system requirements](../README.md.
only when vivado and vitis_hls are detected in the path.
Xilinx Vivado
The following procedures require having AMD (Xilinx) Vivado and Vitis HLS toolsets version 2022.1 installed on the machine. Due to the nature of TCL scripts used to generate board designs Vivado version has to exactly match one provided above - specifically newer versions of Vivado will not work.
In additional to Intellectual Property (IP) cores included in Vivado, two additional licenses are necessary:
- Non-cost license for Ultrascale+ 100G core has to be requested from AMD/Xilinx website, see Xilinx website, to build
100gdesign. - Paid 10G/25G Subsystem for Ultrascale+ to build
8x10gdesign. PSI received non-cost licenses from Xilinx University Program for the latter cores. Therefore, usage of bitstreams generated by PSI continuous integration pipeline for8x10gis only allowed for non-commercial use.
HLS compilation
Make HLS routines:
mkdir build
cd build
cmake ..
make hls
Synthesis
Create PCIe 100g bitstream with the following command:
mkdir build
cd build
cmake ..
make pcie_100g
and 8x10g:
mkdir build
cd build
cmake ..
make pcie_8x10g
When Vivado is not present
During CMake execution, the following executables: vivado and vitis_hls must be present in the path.
If not, build targets will not be generated, and such or similar error message will show up:
$ make pcie_100g
make: *** No rule to make target 'pcie_100g'. Stop.
Gitlab CI
If Gitlab CI is properly set-up, firmware will be automatically built for every commit that modifies FPGA source files. Built firmware should be downloaded as MCS files.
FPGA reference
Frame generator
Jungfraujoch card is equipped with frame generator. It allows to simulate JUNGFRAU detector without having access to such system. It is placed in parallel to Ethernet MAC - so it is placed before the network stack and before any processing happening on the card. In the future a redirection will be possible to send the simulated stream through the 100G TX network link. Frame generator is written in HLS and controlled with AXI-Lite.
Register map
FPGA setup can be done via registers:
| Address | Bits | Meaning | Mode | Notes |
|---|---|---|---|---|
| 0x000000 - 0x00FFFF | Reserved (in case using MicroBlaze in the future, this has to be reserved for internal memory) | |||
| 0x010000 | 32 | Action Control Register | ||
| Bit 0 - Action start | R/W | |||
| Bit 1 - Action idle | R | |||
| Bit 2 - Action cancel | R/W | cleared on reset or action start | ||
| Bit 3 - Clear network counters | R/W | cleared on reset | ||
| Bit 12:4 - Debug signals (see hdl/action_config.v) | R | |||
| Bit 16 - AXI Mailbox interrupt 0 | R | |||
| 0x010004 | 32 | Reserved | - | |
| 0x010008 | 32 | Reserved | - | |
| 0x01000C | 32 | GIT SHA1 | R | |
| 0x010010 | 32 | Reserved | R | |
| 0x010014 | 32 | Reserved | R | |
| 0x010018 | 32 | Jungfraujoch FPGA variant | R | |
| 0x01001C | 32 | Reserved | R | |
| 0x010020 | 32 | Max. number supported detector modules | R | constant |
| 0x010024 | 32 | Reserved | R | constant |
| 0x010028 | 64 | Pipeline stalls before writing to host memory | R | reset on action start |
| 0x010030 | 64 | Pipeline stalls before accessing HBM | R | reset on action start |
| 0x010038 | 32 | FIFO status (see action_config.v for details) | R | |
| 0x01003C | 32 | Size of single HBM channel in bytes (default value for the particular card) | R/W | should not be altered for standard operation |
| 0x010040 | 64 | Packets processed by the action | R | cleared on reset or action start |
| 0x010048 | 64 | Valid ethernet packets | R | cleared on reset |
| 0x010050 | 64 | Valid ICMP packets | R | cleared on reset |
| 0x010058 | 64 | Valid UDP packets | R | cleared on reset |
| 0x010060 | 64 | Valid detector packets processed by the card | R | cleared on reset |
| 0x010068 | 64 | Packets flagged as errors by CMAC | R | cleared on reset |
| 0x010070 | 64 | Pipeline stalls before data processing | R | reset on action start |
| 0x010078 | 64 | AXI-beats before accessing HBM | R | reset on action start |
| 0x010080 | 64 | AXI-beats before data processing | R | reset on action start |
| 0x010088 | 64 | AXI-beats before host writer | R | reset on action start |
| 0x010090 | 64 | Last encountered SwissFEL pulse ID | R | cleared on reset |
| 0x010100 | 32 | Spot finder photon count threshold | R/W | |
| 0x010104 | 32 | Spot finder signal-to-noise ratio threshold (single-precision float) | R/W | |
| 0x010200 | 64 | MAC address source for internal frame generator | R/W | network byte order |
| 0x010208 | 32 | IPv4 address source for internal frame generator | R/W | network byte order |
| 0x01020C | 32 | Number of detector modules (value minus one: 0 => 1 module, 1 => 2 modules, etc.) | R/W | |
| 0x010210 | 32 | Data collection mode | R/W | |
| Bit 0 - Conversion to photons | ||||
| Bit 1 - Output extend to 32-bit | ||||
| Bit 2 - Output is unsigned integer | ||||
| Bit 3 - Use sq. root lossy compression | ||||
| Bit 7 - JUNGFRAU fixed G1 mode | ||||
| Bit 8 - Set to zero values below threshold | ||||
| Bit 16:31 - Data collection ID (carried with completions) | ||||
| 0x010214 | 32 | Photon energy in keV (single-precision float) | R/W | |
| 0x010218 | 32 | Number of frames expected in the data collection (defines termination condition) | R/W | |
| 0x01021C | 32 | Number of storage cells | R/W | |
| 0x010220 | 32 | Summation on card (value minus one: 0 => summation of 1, 1 => summation of 2, etc.) | R/W | |
| 0x010224 | 32 | Coefficient for sq. root compression (need to set bit in data collection mode to apply) | R/W | |
| 0x010225 | 32 | Threshold; set values below set to zero (need to set bit in data collection mode to apply) | R/W | |
| 0x030000 - 0x03FFFF | AXI Mailbox for Work Request / Work Completion | See Xilinx PG114 for register map | ||
| 0x040000 - 0x04FFFF | QuadSPI flash | See Xilinx PG153 for register map | ||
| 0x050000 - 0x05FFFF | Interrupt controller | See Xilinx PG099 for register map | ||
| 0x060000 - 0x06FFFF | Load calibration (HLS) | |||
| 0x070000 - 0x07FFFF | AXI Firewall | See Xilinx PG293 for register map | ||
| 0x080000 - 0x08FFFF | Frame generator (HLS) | |||
| 0x090000 - 0x09FFFF | PCIe DMA control | See Xilinx PG195 for register map | ||
| 0x0A0000 - 0x0AFFFF | I2C clock generator | See Xilinx PG195 for register map | ||
| 0x0C0000 - 0x0FFFFF | Xilinx Card Management Solution Subsystem management subsystem | See Xilinx PG348 for register map | ||
| 0x100000 - 0x10FFFF | MAC 10G / CMAC 100G | See Xilinx PG210/PG203 for register map | ||
| 0x110000 - 0x11FFFF | MAC 10G | See Xilinx PG210 for register map | ||
| 0x120000 - 0x12FFFF | MAC 10G | See Xilinx PG210 for register map | ||
| 0x130000 - 0x13FFFF | MAC 10G | See Xilinx PG210 for register map | ||
| 0x140000 - 0x14FFFF | MAC 10G | See Xilinx PG210 for register map | ||
| 0x150000 - 0x15FFFF | MAC 10G | See Xilinx PG210 for register map | ||
| 0x160000 - 0x16FFFF | MAC 10G | See Xilinx PG210 for register map | ||
| 0x170000 - 0x17FFFF | MAC 10G | See Xilinx PG210 for register map | ||
| 0x200000 - 0x20FFFF | Eth/IPv4 network stack for interface #0 | |||
| 0x210000 - 0x21FFFF | Eth/IPv4 network stack for interface #1 | |||
| 0x220000 - 0x22FFFF | Eth/IPv4 network stack for interface #2 | |||
| 0x230000 - 0x23FFFF | Eth/IPv4 network stack for interface #3 | |||
| 0x240000 - 0x24FFFF | Eth/IPv4 network stack for interface #4 | |||
| 0x250000 - 0x25FFFF | Eth/IPv4 network stack for interface #5 | |||
| 0x260000 - 0x26FFFF | Eth/IPv4 network stack for interface #6 | |||
| 0x270000 - 0x27FFFF | Eth/IPv4 network stack for interface #7 | |||
| 0x400000 - 0x47FFFF | 64 | Address table: decodes handles used by load_calibration and host_writer to DMA addresses |
AXI Mailbox
AXI mailbox is used to send work request from host to action, and receive work completions. Messages are exchanged through AXI Mailbox IP from Xilinx (see Xilinx PG114).
Work request has the following structure:
| Bit start | Bit end | Meaning |
|---|---|---|
| 0 | 15 | Work request ID (handle) |
Work completion has the following structure:
| Bit start | Bit end | Meaning |
|---|---|---|
| 0 | 15 | Work request ID (handle) |
| Special values: | ||
| 65534 - start of data collection | ||
| 65535 - end of data collection | ||
| 15 | 31 | Data collection ID |
HBM memory
| Interface number | Core | Meaning |
|---|---|---|
| 0-1 | jf_conversion | Gain factor G0 |
| 2-3 | jf_conversion | Gain factor G1 |
| 4-5 | jf_conversion | Gain factor G2 |
| 6-7 | jf_conversion | Pedestal G0 |
| 8-9 | jf_conversion | Pedestal G1 |
| 10-11 | jf_conversion | Pedestal G2 |
| 12-13 | integration | Integration map |
| 14-15 | integration | Integration weights |
| 16-17 | spot_finder_mask | Spot finder resolution |
| 18-19 | roi_calc | ROI calculation |
| 20-21 | frame_generator | Frame generator |
| 22-23 | load_from_hbm | Frame summation |
| 24-25 | load_from_hbm | Frame summation |