# How much score could a CoreScore score if a CoreScore could score cores?

**Spring 2022 RISC-V week - Paris 2022.05.04** 

#### Who are we?

Klas Nordmark

FPGA & Embedded SW Consultant at 2550 Engineering

Skriv text här

Olof Kindgren

FPGA & Embedded SW Consultant at Qamcom Research & Technology

### Features



- RV32I
- Wishbone data bus
- Wishbone instruction bus
- CFU extension interface
- Formally verified with riscv-formal
- BSD licensed
- + optionally
  - CSR extension
  - A bit of the privilege spec (enough to run Zephyr)
  - M extension
  - C extension



Abdul Wadood C extension LFX 2022

Zeeshan Rafique M extension, GSoC 2021



# What's a bit-serial CPU?

Skriv  $0xA0 \mid 0x33 = 0xB3$ 

Skriv 
$$0xA0 \mid 0x33 = 0xB3$$

Skriv 0xA2 + 0x33 = 0xD5











Skriv  $0xA0 \mid 0x33 = 0xB3$ 





#### Size

|                  | Minimal |     | Standard |     |
|------------------|---------|-----|----------|-----|
|                  | LUT     | EF  | LUT      | FF  |
| Lattice iCE40    | 198     | 164 | 261      | 182 |
| AMD Artix-7      | 125     | 164 | 170      | 182 |
| Intel Cyclone 10 | 239     | 164 | 297      | 182 |



### Documentation

#### Documentation



serv\_alu handles alu operations. The first input operand (A) comes from i\_rs1 and the second operand (B) comes from i\_rs2 or i\_imm depending on the type of operation. The data passes through the add/sub or bool logic unit and finally ends up in o\_rd to be written to the destination register. The output o\_cmp is used for conditional branches to decide whether or not to take the branch.

The add/sub unit can do additions A+B or subtractions A-B by converting it to A+B+1. Subtraction mode (i\_sub = 1) is also used for the comparisions in the slt\* and conditional branch instructions. The +1 used in subtraction mode is done by preloading the carry input with 1. Less-than comparisons are handled by converting the expression A<B to A-B<0 and checking the MSB, which will be set when the result is less than 0. This however requires sign-extending the operands to 33-bit inputs. For signed operands (when i\_cmp\_sig is set), the extra bit is the same as the MSB. For unsigned, the extra bit is always 0. Because the ALU is only active for 32 cycles, the 33rd bit must be calculated in parallel to the ordinary addition. The result from this operations is available in result\_lt. For equality checks, result\_eq checks that all bits are 0 from the subtraction.



#### Documentation





- Minimal reference platform
- Available for 30+ FPGA boards
- Runs Zephyr OS

Servant



Servant

- Minimal reference platform
- Available for 30+ FPGA boards
- Runs Zephyr OS



- SoClet
- RF in RAM

### Serving

- Replace complex FSM
  (e.g. DDR init, logging, status...)
- Sensor data aggregation
- Replace 8-bit MCUs
- Replace threaded apps
- Benchmarking

#### Use cases



an award-giving benchmark for FPGAs and their synthesis/P&R tools



https://github.com/olofk/corescore



an award-giving benchmark for FPGAs and their synthesis and P&R tools

| Board              | CoreScore |  |
|--------------------|-----------|--|
| vcu128             | 6000      |  |
| intel_s10gx_devkit | 5600      |  |
| vcu118             | 5087      |  |
| haps_dx7           | 3040      |  |
| intel_a10gx_devkit | 2600      |  |
| de5_net            | 1568      |  |
| storeypeak         | 1152      |  |
| hpc_k7             | 1024      |  |
| hpc_ku             | 1024      |  |
| genesys2           | 967       |  |
| kc705              | 960       |  |
| zcu106             | 940       |  |
| polarfireeval      | 882       |  |

OpenMPW is a Google-financed initiative by Efabless for doing free prototype runs of open source ASICs in the Skywater 130nm process.

Submissions are built using the FOSS Openlane flow and are required to be surrounded by the Caravel harness, including a picorv32 with a Wishbone interface that the user design can connect to.



Subservient is a variant of Serving exposing an SRAM interface and a Wishbone debug interface that can read or write the SRAM when Subservient is kept in debug mode with a separate pin.

We have sent in two slightly different variants, in order to meet timing with later versions of Openlane after the presumed failure of MPW-2



User project area of 2920 by 3520 micrometers

Subservient synthesized together with 512 bytes of DFFRAM about 1000 by 1000 micrometers, 25 MHz clock

Implies a corescore of six or so. Not very good...



Removing the memory both significantly decreases gate count and eases up the place and route problem, allowing higher density.

Subservient without memory here about 160 by 160 micrometers

If memory was out of the picture completely, we'd have a Corescore of nearly 400!



Using OpenRAM has been problematic for us

1kbyte, 1 byte wide ready-made macro for sky130A is about 460 micrometers wide

With one such macro for each smaller Subservient, we'd look at about 20 cores in the user project area...

Smaller memories? Shared memories?

We have not yet received the first batch of chips and can't yet confirm if SERV has been successfuly ASIC-proven.

#### Future work

DSRV, QERV

More extensions

FPGA with hard SERV for each SRAM

Recreate SERV in Logisim, Minecraft, 7400 chips, tubes...

#### Thank you for your time

Subservient https://github.com/klasnordmark/subservient\_wrapped







Klas Nordmark linkedin.com/in/klas-nordmark-33a97244 @knordmark89 https://2550.engineering

