# Design of the SUS Language Lennart Van Hirtum PC2 - Paderborn University Prof. Christian Plessl PC2 - Paderborn University ## Outline - Motivation & where others fall short - SUS Design Process - Example SUS Code - Typing - Latency Counting - Implementation & Future - Live Demo # Motivation ## What I want out of an HDL - Full control over generated hardware - All HDL code must be Synthesizeable - Keep your simulations in software like CocoTB! - Language must have notion of Cycle-wise timing - First-class Pipelining support - First-class IDE support 1 - (System)Verilog, VHDL? - (System)Verilog, VHDL? - Very verbose - No defence against incorrect hardware ``` 1 'timescale lns / lps module CounterWithLoad ( output reg [3:0] Count, 3 input wire clock, reset, enable, load, 4 input wire [3:0] start number); 5 6 always @ (posedge clock or negedge reset) begin: CountWithLoad 8 if (!reset) 9 Count <= 0; 10 else if (enable) 11 if (load) 12 // set the counter to a specified value 13 14 Count <= start number; else 15 Count <= Count + 1: 16 end //CountWithLoad 17 18 19 endmodule 20 ``` - (System)Verilog, VHDL? - Very verbose - No defence against incorrect hardware - (System)Verilog, VHDL? - Very verbose - No defence against incorrect hardware - Embedded Generation Languages (Chisel in Scala, SystemC in C++)? - (System)Verilog, VHDL? - Very verbose - No defence against incorrect hardware - Embedded Generation Languages (Chisel in Scala, SystemC in C++)? - Suffer under parent language Syntax - Little defence against incorrect hardware - No hardware-specific IDE support ``` SC_MODULE(Adder){ Template parameter public: sc_in<int> a; sc in<int> b; Ports sc_out<int> c:__ SC HAS PROCESS(Adder) Adder(sc_module_name name, int t): sc module(name), offset(t) SC_METHOD(compute) senstitive << a << b; Internal Variables private: Interface function calls int offset; void compute(){ int temp; Process temp = a - read() + b - read() + offset; c->write(temp); Interface function calls ``` ``` class MemFifo[T <: Data](gen: T, depth: Int) extends Fifo(gen: T, depth: Int) {</pre> def counter(depth: Int, incr: Bool): (UInt, UInt) = { val cntReg = RegInit(0.U(log2Ceil(depth).W)) val nextVal = Mux(cntReg === (depth-1).U, 0.U, cntReg + 1.U) when (incr) { cntReg := nextVal (cntReg, nextVal) val mem = SyncReadMem(depth, gen) val incrRead = WireInit(false.B) val incrWrite = WireInit(false.B) val (readPtr, nextRead) = counter(depth, incrRead) val (writePtr, nextWrite) = counter(depth, incrWrite) ``` - (System)Verilog, VHDL? - Very verbose - No defence against incorrect hardware - Embedded Generation Languages (Chisel in Scala, SystemC in C++)? - Suffer under parent language Syntax - Little defence against incorrect hardware - No hardware-specific IDE support - (System)Verilog, VHDL? - Very verbose - No defence against incorrect hardware - Embedded Generation Languages (Chisel in Scala, SystemC in C++)? - Suffer under parent language Syntax - Little defence against incorrect hardware - No hardware-specific IDE support - HLS? HLS: We can do it all! " It improves portability " "Single-source host and kernel" "The compiler will figure it out" - (System)Verilog, VHDL? - Very verbose - No defence against incorrect hardware - Embedded Generation Languages (Chisel in Scala, SystemC in C++)? - Suffer under parent language Syntax - Little defence against incorrect hardware - No hardware-specific IDE support - HLS? #### (System)Verilog, VHDL? - Very verbose - No defence against incorrect hardware ### - Embedded Generation Languages (Chisel in Scala, SystemC in C++)? - Suffer under parent language Syntax - Little defence against incorrect hardware - No hardware-specific IDE support #### HLS? - Lose control over generated hardware (Are you sure it's optimal?) - (Corporate HLS) is not portable at all, and stifles lower-level access for competitors - (System)Verilog, VHDL? - Very verbose - No defence against incorrect hardware - Embedded Generation Languages (Chisel in Scala, SystemC in C++)? - Suffer under parent language Syntax - Little defence against incorrect hardware - No hardware-specific IDE support - HLS? - Lose control over generated hardware (Are you sure it's optimal?) - (Corporate HLS) is not portable at all, and stifles lower-level access for competitors - Other NeoHDLs (Filament & Spade)? #### (System)Verilog, VHDL? - Very verbose - No defence against incorrect hardware ### - Embedded Generation Languages (Chisel in Scala, SystemC in C++)? - Suffer under parent language Syntax - Little defence against incorrect hardware - No hardware-specific IDE support #### - HLS? - Lose control over generated hardware (Are you sure it's optimal?) - (Corporate HLS) is not portable at all, and stifles lower-level access for competitors #### Other NeoHDLs (Filament & Spade)? - Focus on Correctness - Focus on Timing ``` /* A parameteric shift register */ pipeline(3) cpu(...) -> Option<int<32>> { let stall = stage(+1).is_jump; comp Shift[W, N]<'G:1>( let pc = inst program_counter( clk, rst, in: ['G, 'G+1] W stall, stage(execute).jump_target ) -> ( reg; let insn = inst read_progmem(clk, pc); // delay the signal by N cycles let is_jump = is_jump(insn); let insn = if stage(+1).is_jump {NOP} else {insn}; out: ['G+N, 'G+N+1] W reg; let (opa, opb) = inst regfile( clk, insn, stage(writeback).result // Tracks the wires b/w registers bundle f[N+1]: for<k> ['G+k, 'G+k+1] W; let jump_target = match insn { Instruction::Jump(target) => Some(target), f{0} = in: => None; for i in O..N { let alu_result = alu(insn, opa, opb); d := new Delay[W]<'G+i>(f{i}); reg; 'writeback f{i+1} = d.out; let result = match insn { Instruction::Add(_, _, _) => Some(alu_result), Instruction::Sub(_, _, _) => Some(alu_result), Instruction::Set(_, _) => Some(alu_result), out = f{N}; Instructions::Jump(_) => None ``` # Hot Takes ## **Hot Takes** There are no reusable hardware components Software prototyping for hardware doesn't exist (unless it's named Verilator) HLS is not the solution ## So... Abstraction Bad? Of course not. **Abstraction Good!** ## **Good Abstractions** **Bad Abstractions** Represent the programmer's intent Replace error-prone constructs Don't get in the way Trade in design freedom for "abstraction" ⇒ Reduce Designer mental load ⇒ Contort the Designer into the vendor's paradigm # SUS Design Process Control Feedback Pragmatism ## Control If I can draw it, I want to be able to write it ⇒ All synchronous hardware representable ### Feedback #### **LSP** - Errors & Info - Navigation - Code Suggestions #### Instant In-Editor Feedback No separate Compile Step ``` module bad_cycle : int a -> int r { state int state_reg initial state_reg = 0 r = state_reg ``` ### reg state\_reg = state\_reg + a ``` This register is part of a net-positive latency cycle of +1 state_reg'1 -> state_reg'2 (+1) Which conflicts with the starting latency View Problem (Alt+F8) No quick fixes available ``` ## Pragmatism Let hardware be hardware Simple things should be simple Infer where possible ## Pragmatism Let hardware be hardware Simple things should be simple Infer where possible beneficial # Examples ## XOR gate ``` module xor : bool x1, bool x2 -> bool y { bool w1 = !x1 bool w2 = !x2 bool w3 = x1 \& w2 bool w4 = x2 \& w1 y = w3 \mid w4 ``` ## **Generative Code** ``` module matrix vector mul: int[30][20] mat, int[20] vec -> int[30] result { for int row in 0..30 { int[20] row products for int col in 0..20 { row products[col] = mat[row][col] * vec[col] result[row] = +row products ``` #### FizzBuzz ``` module fizz buzz : int v -> int fb { gen int FIZZ = 15 gen int BUZZ = 11 gen int FIZZ BUZZ = 1511 bool fizz = v % 3 == 0 bool buzz = v \% 5 == 0 if fizz & buzz { fb = FIZZ BUZZ } else if fizz { fb = FIZZ } else if buzz { fb = BUZZ } else { fb = v ``` #### Generative FizzBuzz ``` module fizz buzz gen : int v -> int fb { gen int FIZZ = 15 gen int BUZZ = 11 gen int FIZZ BUZZ = 1511 gen int TABLE SIZE = 256 gen int[TABLE SIZE] lut for int i in 0..TABLE SIZE { gen bool fizz = i % 3 == 0 gen bool buzz = i % 5 == 0 gen int tbl fb if fizz & buzz { tbl fb = FIZZ BUZZ } else if fizz { tbl fb = FIZZ } else if buzz { tbl fb = BUZZ } else { tbl fb = i lut[i] = tbl fb } fb = lut[v] ``` ``` module fizz buzz gen : int v -> int fb { gen int FIZZ = 15 gen int TABLE SIZE = 256 gen int BUZZ = 11 gen int FIZZ BUZZ = 1511 gen int TABLE SIZE = 256 gen int[TABLE SIZE] lut gen int[TABLE SIZE] lut for int i in 0..TABLE SIZE { gen bool fizz = i % 3 == 0 gen bool buzz = i \% 5 == 0 Dependent Types! gen int tbl fb if fizz & buzz { tbl fb = FIZZ BUZZ } else if fizz { tbl fb = FIZZ } else if buzz { tbl fb = BUZZ } else { tbl fb = i ChatGPT lut[i] = tbl fb "Dependent types: powerful but often impractical." fb = lut[v] ``` # **Typing** ## Dependent Types are a nightmare to work with #### The Undecidability of Typability in the Lambda-Pi-Calculus Gilles Dowek School of Computer Science, Carnegie Mellon University Pittsburgh, PA 15213-3890, U.S.A. Abstract. The set of pure terms which are typable in the $\lambda H$ -calculus in a given context is not recursive. So there is no general type inference algorithm for the programming language Elf and, in some cases, some information has to be mentioned by the programmer. #### Undecidability of $D_{\leq}$ and Its Decidable Fragments lex he n-il-1. t JASON HU, University of Waterloo, Canada ONDŘEJ LHOTÁK, University of Waterloo, Canada Dependent Object Types (DOT) is a calculus with path dependent types, intersection types, and object self-Dependent Object Types (DOT) is a calculus with path dependent types, intersection types, and object selfmanagement of the calculus has been proven sound, it references, which serves as the core calculus of Scala 3. Although the calculus has been proven sound, it to and enhyming of D.— a syntactic enheat of DOT. It turns out that even for D.— indecidability proofs of remains open whether type checking in DOT is decidable. In this paper, we establish undecidability proofs of type checking and subtyping of $D_{\le i}$ , a syntactic subset of DOT. It turns out that even for $D_{\le i}$ undecidability proofs of the counterevamples for past attempts. To prove undecidability type checking and subtyping of $D_{\le i}$ a syntactic subset of DOT. It turns out that even for $D_{\le i}$ undecidability is surprisingly difficult to show, as evidenced by counterexamples for past attempts. To prove undecidability, and the substitution of the $D_{\le i}$ undecidability is a specific to the substitution of is surprisingly difficult to show, as evidenced by counterexamples for past attempts. To prove undecidability, we discover an equivalent definition of the $D_{<}$ : subtyping rules in normal form. Besides being easier to reason at the standard of stan we discover an equivalent definition of the D<: subtyping rules in normal form. Besides being easier to reason about, this definition makes the phenomenon of bad bounds explicit as a single inference rule. After removing and identify algorithms to decide them. We about, this definition makes the phenomenon of bad bounds explicit as a single interence rule. After removing this rule, we discover two decidable fragments of $D_{<}$ : subtyping and identify algorithms to decide them. We also the fragments and we prove that the this rule, we discover two decidable fragments of Describing and identify algorithms to decide them. We also the algorithms with respect to the fragments, and we prove that the prove soundness and completeness of the algorithms with respect to the fragments, as algorithms terminate. Our proofs are mechanized in a combination of Coq and Agda. CCS Concepts: • Software and its engineering → General programming. fessional topics → History of programming languages. Additional Key Words and Phrocos. D ACM Reference E- #### What is the SUS-lution? - Want simple type checking → use dynamic typing? - But do we really need to statically prove *all* parametrizations of a module are "correct"? - Or can we simply check each instance? - Want to use type info to hint user at template level → traits & types? - Does this info need to care about array sizes? ⇒ Two typing levels! #### gen int[TABLE\_SIZE] lut #### **Abstract Type** int[] Typecheck at Flattening time Only type names and structure LSP Info & Template checking #### Concrete Type int[256] Typecheck at Instantiation time Type names and concrete values Actually define wires # Latency Counting #### Registers Latency #### State registers ``` module fibonnaci : -> int num { state int cur = 1 state int prev = 0 cur'0 prev'0 num = cur + prev prev = cur num'0 cur = num ``` #### **Latency Registers** i'0 \* ``` module pow17 : int i -> int o { int i2 = i reg int i4 = i2 * i2 int i8 = i4 * i4 reg int i16 = i8 * i8 = i16 * i * * ``` #### **Port Latencies** ``` module example md : factors[4]'0 →product'2 int[4] factors, int add to -> int product, int total { add_to'2 reg int mul0 = factors[0] * factors[1] req int mul1 = factors[2] * factors[3] reg product = mul0 * mul1 reg total = product + add to ``` #### **Latency Annotations** ``` module module_taking_time : int i'0 -> int o'5 { o = i } ``` #### Multiple Interfaces #### Latency offsets Architecture & Future # Short term milestones "Build anything in SUS" - Arbitrary Single-Clock hardware description - Migrate from custom parser to Tree-Sitter - Multi-interface modules - Integer Bounds - Templates & Generative Parameters - Standard Library - Extern (Verilog) Modules - Implement FIFO, Memory Block, Skid Buffer, Distributor, Merger, etc. ### Vague long term milestones "SUS stands for Safety" - CIRCT Compile Target - Enables use of btor2 LTL for verification - Advanced Bounds - Use Bounds system as an alternative to Sum Types - Protect FIFOs, skid buffers, etc in the STL using LTL verification - Arbitrary Multi-Clock hardware description - STL Clock Domain Crossing modules - Multi-Clock STL modules - Inference of generative parameters - Custom Types ## LSP Demo github.com/pc2/sus-compiler - Are "traits" reasonable for hardware? - C-style types vs Rust-style types? - Type inference? - Additional Constructs: first, only, chain? - Backend? - Abstraction for valid/reset protocols? - Built-in Verification? - Bounded integers vs bitwidths? - Integer representations? (2s cpl, vs 1s cpl vs one-hot vs registered) - Built-in Floats? - Sum Types? - Latency offsets within structs? - What becomes of SUS after I graduate?