NoName
This is the book on NoName, a high-level language to write circuits using the kimchi zero-knowledge proof system.
Basics
Noname is a language that closely resembles Rust.
For example, in the following program you can see a main
function:
fn main(pub public_input: Field, private_input: Field) { let x = private_input + public_input; assert_eq(x, 2); }
The only differences with Rust are:
- The
pub
keyword is used to mark public inputs. By default all arguments are private. assert_eq
is not a macro, there are no macros in noname.- a
Field
type is used as main types everywhere. It is defined infield.rs
to be the pasta Fp field (the base field of the Pallas curve). If these words mean nothing to you, just seeField
as a large number. Ideally programs should be written without this type, but for now custom types do not exist.
To run such a file, and assuming you have Rust installed, you can type in the terminal:
$ cargo run -- --path path/to/file.no --private-inputs '{"private_input": ["1"]}' --public-inputs '{"public_input": ["1"]}'
As you can see, inputs are passed with a JSON format, and the values are expected to be encoded in decimal numbers.
Builtins and use statements
Some builtin functions are available by default:
assert_eq
to check that two field elements are equalassert
to check that a condition is true.
Like in Rust, you can also import other libraries via the use
keyword.
If you do this, you must know that you can only import a library, but not its functions (and types, and constants) directly.
For example, to use the poseidon function from the crypto library (or module), you must import std::crypto
and then qualify your use of crypto::poseidon
:
use std::crypto; fn main(pub public_input: Field, private_input: [Field; 2]) { let digest = crypto::poseidon(private_input); assert_eq(digest[0], public_input); }
Note that currently, only built-in libraries (written in Rust) are working. In the future we’d like for other libraries to be written in the noname language.
Field
The Field
type is the primitive type upon which all other types are built.
It is good to know about it as it is used in many places, and is error prone: it does not match the size of commonly-found types like u32
and u64
and can have unexpected behaviors as it can overflow or underflow without emitting an error.
Ideally, you should never use the Field
type, but currently the library is quite limited and the ideal world is far away.
Note that you can define Field
elements directly in the code by writing a decimal number directly. For example:
#![allow(unused)] fn main() { let x = 2; assert(y, 4); }
Arrays
While there are no dynamic arrays (or vectors), you can use fixed-size arrays like in Rust.
For the moment, I believe that arrays can only be declared in a function argument as the following declaration hasn’t been implemented yet:
#![allow(unused)] fn main() { let x = [1, 2, y]; }
Boolean
Booleans are similar to Rust’s boolean. They are currently the only built-in type besides Field
and arrays.
#![allow(unused)] fn main() { let x = true; let y = false; assert(!(x & y)); }
Mutability
Variables are by default not mutable. To make a variable mutable, you must use the mut
keyword:
#![allow(unused)] fn main() { let mut x = 1; x = 2; // GOOD let y = 1; y = x + y; // BAD }
For loops
fn main(pub public_input: Field, private_input: [Field; 3]) { let mut sum = 0; for i in 0..3 { sum = sum + private_input[i]; } assert_eq(sum, public_input); }
Constants
Like variables and function names, constants must be lowercase.
At the moment they can only represent field elements. Perhaps it would be nice to be able to represent different types in the future.
const player_one = 1; const player_two = 2; fn main(pub player: Field) -> Field { assert_eq(player_one, player); let next_player = player + 1; assert_eq(player_two, next_player); return next_player; }
If Else statements
Currently, if/else statements are not supported. Only the ternary operator is:
fn main(pub xx: Field) { let plus = xx + 1; let cond = xx == 1; let yy = cond ? plus : xx ; assert_eq(yy, 2); }
The two branches of the ternary operator must be variables (as in the xample), or array accesses (e.g. thing[0]
), or field accesses (e.g. thing.field
).
Functions
fn add(x: Field, y: Field) -> Field { return x + y; } fn double(x: Field) -> Field { return x + x; } fn main(pub one: Field) { let four = add(one, 3); assert_eq(four, 4); let eight = double(4); assert_eq(eight, double(four)); }
Custom types
struct Thing { x: Field, y: Field, } fn main(pub x: Field, pub y: Field) { let thing = Thing { x: 1, y: 2, }; assert_eq(thing.x, x); assert_eq(thing.y, y); }
Methods on custom types
struct Thing { x: Field, y: Field, } fn Thing.new(x: Field, y: Field) -> Thing { return Thing { x: x, y: y, }; } fn Thing.verify(self, v: Field) { assert_eq(self.x, v); assert_eq(self.y, v + 1); } fn Thing.update_and_verify(self) { let new_thing = Thing { x: self.x + 1, y: self.y + 1, }; new_thing.verify(2); } fn main(pub x: Field) { let thing = Thing.new(x, x + x); thing.update_and_verify(); }
technically you can even call static methods from a variable of that type:
#![allow(unused)] fn main() { let y = thing.new(3, 4); }
It’s not necessarily pleasant to read, and we could prevent it by storing some meta information (static_method: bool
) in the type checker, but it’s not a big deal.
Early returns
TODO
Hints
TODO
Shadowing
We forbid variable shadowing as much as we can.
For example, this should not work:
#![allow(unused)] fn main() { let x = 2; let x = 3; // this won't compile let y = 4; for i in 0..4 { let y = i; // this won't compile either } }
Compilation
The compilation of noname programs goes through the following flow:
- Lexer. A lexer (
lexer.rs
) is used to parse the source code into a list of tokens. This is pretty primitive, but will detect some minor syntax issues. - Parser. A parser (
parser.rs
) is used to parse meaning from the code. It will convert the tokens output by the lexer into an abstract syntax tree (AST) using strong types likeStatement
andExpression
(TODO: link to rust doc). It will also error if some code does not make sense according to the grammar (see the Grammar chapter). - Type checking. A type checker (
type_checker.rs
) takes the AST produced by the parser and does import resolution and type checking:- Built-in functions. Functions like
assert_eq
are injected into the environment. - Custom imports. Modules imported via the
use
keyword are resolved and added to the environment. For now, these can only be built-in functions, and noname functions or libraries are not supported (of course it is essential to support them in the future). - Type checking. The type checker verifies that the types of each variables and expressions in the AST make sense. It is a very simple type checker that can do some simple type inference. Temporary type information (type of an expression) is not stored, and is thrown away as soon as the type checker can afford it. a TAST for typed AST is returned, but it mostly contains resolved imports and most type information has been thrown away.
- Built-in functions. Functions like
- Gate construction. The TAST produced by the type checker is passed to the circuit writer (
circuit_writer.rs
), also called the constraint writer, which goes through it one more time and converts it into:- compiled circuit: a series of gates and wires
- prover instructions: instructions on how to run the function for the witness generation (used by the prover)
A simple ASM language is also used, and the circuit can be encoded in this language. See the ASM chapter.
Terminology
A note on topology:
- functions: noname functions each contain their scope and can be interacted with their interface (arguments and return value)
- module/program: a noname module is a single file (this is a nice current limitation of noname) containing functions, constants, and structures.
- library: a noname library is a module/program without a
main()
function, as well as dependencies (other libraries) - executable: a noname executable is like a library, except that its module/program has a
main()
function.
Grammar
The syntax of the noname language is described through its grammar.
We use a notation similar to the Backus-Naur Form (BNF) to describe the grammar:
land := city "|" ^ ^ ^ | | terminal: a token | | | another non-terminal | non-terminal: definition of a piece of code city := [ sign ] "," { house } ^ ^ optional | 0r or more houses sign := /a-zA-Z_/ ^ regex-style definition
There are some comments in the parser code (parser.rs
) that attempt to define this grammar.
Essentially, it is made to look like Rust, but with some differences of philosophies:
- expressions cannot be statements, unless they return no value (act using side effects).
Spans
To be able to efficiently track errors, we have a span type:
#![allow(unused)] fn main() { pub struct Span(pub usize, pub usize); }
which represents a location in the original source code:
- the first number is the offset in the source code file
- the second number if the length of the span (e.g. 1 character)
We start tracking spans in the lexer, and then pass them around to the parser, and then to the compiler. Even gates and wirings have spans associated with them so that we can easily debug those.
Filename
The filename is currently missing from the Span
, it is annoying to add it as a String
because then we can’t easily copy the span around (String
is not Copy
but Clone
).
One way to solve this, is to add the filenames in a Hashmap<usize, String>
, and have the usize
be in the Span
.
Paths
Paths are structures identifying snippets of code that look like this:
#![allow(unused)] fn main() { some_module::some_ident.stuff.z }
The identifier some_module
, appearing before the ::
, is an optional module, pointing to code that exists in another library. It is always lowercase.
The identifier some_ident
is mandatory. It can represent a type (if it starts with a capital letter), a function name, a variable name, a constant name, etc.
More identifiers can be concatenated together to form a longer path (using .
).
A path is represent like this internally:
#![allow(unused)] fn main() { /// A path represents a path to a type, a function, a method, or a constant. /// It follows the syntax: `module::X` where `X` can be a type, a function, a method, or a constant. /// In the case it is a method `a` on some type `A` then it would read: /// `module::A.a`. #[derive(Debug, Clone)] pub struct Path { /// A module, if this is an foreign import. pub module: Option<Ident>, /// The name of the type, function, method, or constant. /// It's a vector because it can also be a struct access. pub name: Vec<Ident>, /// Its span. pub span: Span, } }
Expressions using Path
A path does not represent an expression by itself. The following expressions make use of path
:
#![allow(unused)] fn main() { pub enum ExprKind { /// `module::some_fn()` FnCall { name: Path, args: Vec<Expr>, }, /// `module::SomeType.some_method()` MethodCall { self_name: Path, method_name: Ident, args: Vec<Expr>, }, /// `module::some_var` or /// `module::SomeType.some_field.some_other_field` Variable(Path), /// `module::SomeType.some_field[some_expr]` or /// `module::some_const[some_expr]` ArrayAccess { name: Path, idx: Box<Expr>, }, /// `module::SomeType { field1: expr1; field2: expr2 }` CustomTypeDeclaration(Path, Vec<(Ident, Expr)>), } }
Type Checker
Noname uses a simple type checker to ensure that all types are consistent in the program.
For example, in code like:
#![allow(unused)] fn main() { let x = y + z; }
the type checker will ensure that y
and z
are both field elements (because the operation +
is used).
And in code like:
#![allow(unused)] fn main() { assert_eq(a, b); }
the type checker will ensure that a
and b
are of the same types, since they are being compared.
Type inference
The type checker can do some simple type inference. For example, in the following code:
#![allow(unused)] fn main() { let x = y + z; }
the type of x
is inferred to be the same as the type of y
and z
.
Inference is willingly kept naive, as more type inference would lead to less readable code.
Scope
The type checker must be aware of scopes, as it keeps track of the type of variables and functions that are local to each scope.
For example:
#![allow(unused)] fn main() { let x = 2; for i in 0..2 { let y = x + 1; // x exists inside of this for loop } let z = 1 + y; // BAD: y doesn't exist outside of the for loop }
To do this, each function is passed an Environment which contains a list of all variables along with their type information.
#![allow(unused)] fn main() { pub struct Environment { /// created by the type checker, gives a type to every external variable pub var_types: HashMap<String, TypeInfo>, // ... /// the functions present in the scope /// contains at least the set of builtin functions (like assert_eq) pub functions: HashMap<String, FuncInScope>, /// stores the imported modules pub modules: HashMap<String, ImportedModule>, } }
So that shadowing is disallowed, even in different scopes, there is only one variable that is stored, but the scope is stored in TypeInfo
and matched against the current scope to see if the current scope is the same or a direct child.
An environment is unique to a function, as it is important that different functions can use the same variable names.
Some notes:
- Currently the notion of module is quite shaky. It is used mostly for
crypto::poseidon
at the moment. functions
is mostly used for builtins likeassert_eq
modules
is mostly used for the functions instd::crypto
, which only containscrypto::poseidon
atm.
more:
- I think Environment mixes things
- we should be able to create a new Environment whenever we parse a new function, so the functions/modules should be part of another (AvailableToAllScopes)
- variables is something that has nothing to do with the “Type” Environment and should be moved elsewhere no? GateCreationEnv?
- there needs to be a namespace in the typeinfo
noname ASM
The circuit that noname compiles to can be serialized as a simple ASM language. For example the following noname program:
fn main(pub public_input: Field, private_input: Field) { let x = private_input + public_input; assert_eq(x, 2); }
will be compiled to the following noname asm:
@ noname.0.5.0
DoubleGeneric<1>
DoubleGeneric<1,1,-1>
DoubleGeneric<1,0,0,0,-2>
DoubleGeneric<1,-1>
(0,0) -> (1,1)
(1,2) -> (3,1)
(2,0) -> (3,0)
which includes:
- the version of noname used to compile this circuit. This is important as the prover needs to know what version of noname to use to prove executions of this circuit.
- a list of gates and how they are tweaked (the values in the brackets).
- a list of wires which is canonically ordered so that every compilation gives the same resulting noname asm.
Structs
User can define custom structs like so:
#![allow(unused)] fn main() { struct Thing { x: Field, y: Field, } }
and can declare and access such structs like so:
#![allow(unused)] fn main() { let thing = Thing { x: 1, y: 2 }; let z = thing.x + thing.y; }
Internally, a struct is represented within the Var
type.
Methods
A method on what?
There’s one problem when handling methods in the circuit-writer: how do you know where the code of that method is? For example:
#![allow(unused)] fn main() { let thing = Thing { x: 1, y: 2 }; let z = thing.valid(3); }
at this point the circuit-writer knows that Thing
has a method called valid
, but will still wonder what the type of thing
is.
due to this, the circuit-writer needs to store the type of local variables in scope. And this is why FnEnv
also keeps track of the type of local variables:
#![allow(unused)] fn main() { /// Is used to store functions' scoped variables. /// This include inputs and output of the function, as well as local variables. /// You can think of it as a function call stack. pub struct FnEnv { /// The current nesting level. /// Starting at 0 (top level), and increasing as we go into a block. current_scope: usize, /// Used by the private and public inputs, /// and any other external variables created in the circuit /// This needs to be garbage collected when we exit a scope. /// Note: The `usize` is the scope in which the variable was created. vars: HashMap<String, (usize, VarInfo)>, } /// Information about a variable. #[derive(Debug, Clone)] pub struct VarInfo { /// The variable. pub var: Var, /// We keep track of the type of variables, eventhough we're not in the typechecker anymore, /// because we need to know the type for method calls. pub typ: TyKind, } }
This still doesn’t fix our problem. In the line:
#![allow(unused)] fn main() { let thing = Thing { x: 1, y: 2 }; }
the local variable thing
is stored, but the right hand side is computed via the compute_expr()
function which will go through the AST and potentially create different anonymous variables until it can compute a value.
There’s three ways to solve this:
- Either the type checker stores type information about each expression it parses. This is what the Rust compiler does I believe: each
Expr
AST node has a unique node identifier that can be used to search type information in a map. - Or, more simply, the circuit-writer’s
compute_expr()
function that returns anOption<Var>
could be modified to returnOption<VarInfo>
. This is a bit annoying as we’re recomputing things we’ve done in the type checker. - A variant of the previous option is to change
Var
so that it also contain a type (might as well).
So we implement option 1: the type checker now stores the type information of each Expr
node in the AST under a hashmap that is later passed to the circuit-writer:
#![allow(unused)] fn main() { /// The environment we use to type check a noname program. pub struct TypedGlobalEnv { // ... /// Mapping from node id to TyKind. /// This can be used by the circuit-writer when it needs type information. node_types: HashMap<usize, TyKind>, } }
Literals and the const
keyword
We want to be able to support a few things:
- writing numbers that will act as field elements directly in the code
- writing numbers that will act as relatively small numbers (for indexing into arrays for example) in the code
- writing functions that will accept constant arguments. For example to index into an array.
The first two points allow us to write things like that:
#![allow(unused)] fn main() { let x = 3; assert_eq(y[5], 4); }
The third point allows us to write things like that:
#![allow(unused)] fn main() { fn House.get(self, room: Field) -> Field { return House.rooms[room]; } }
Since House
takes a Field
here, it can in theory be used with anything during type checking.
This is bad. There are two solutions:
- During type checking, when we realize that
room
is used to index into an array, we enforce that it must be a constant value during compilation. - We create a distinct type for constants and literals.
Approach 1. is not elegant, because it means that it is not clear from the signature of the function alone that the room
argument must be a constant.
The user of the function will only get warned when trying to compile the program.
Approach 2. is interesting, because we already have such a type internally to track literals: a BigInt
.
The name is a bit misleading in the case of array accesses, because we additionally enforce that it is NOT a big integer, but rather a 32-bit integer (u32
in Rust).
Implementing a literal type
Approach 2 can be implemented in two ways:
a. Use a brand new type, like BigInt
for literals.
b. Use a const
attribute to indicate that it is a constant.
Approach a. is a bit clumsy in my opinion because the developer need to remember about a new type name, and understand the distinction with that and Field
.
On the other hand, approach b. uses the const
keyword which is already well-known in many compiled programming languages.
What about this:
#![allow(unused)] fn main() { fn House.get(self, const room: Field) -> Field { // versus fn House.get(self, room: const Field) -> Field { }
To contrast, the two other existing attributes (pub
and mut
) are placed in front of the variable names, not the type names.
One could argue that the type is the same, but the variable being passed is a constant, and so it makes more sense to implement the first version. This is what we do in noname.
At the time of this writing, the const
keyword only seem to make sense in a function argument, and so is implemented in the same way as the pub
attribute:
#![allow(unused)] fn main() { pub enum AttributeKind { Pub, Const, } pub struct Attribute { pub kind: AttributeKind, pub span: Span, } pub struct FnArg { pub name: Ident, pub typ: Ty, pub attribute: Option<Attribute>, pub span: Span, } }
When a function is parsed by the type checker, a const Field
is transformed into a BigInt
.
And as such, the type checker will be happy with that variable being used to index into an array, or being used by other functions expecting a constant.
What about other types being const
?
I don’t think it makes sense for now, as I can only think of array access requiring this.
So we don’t implement it.
If we do want to support that one day, we will have to track more than TyKind
in the typechecker…
This can be achieved by adding a const
field in the TypeInfo
structure that tracks type-related data on noname variables present in the scope.
Expressions
Field accesses
A field access is an access to a field of a structure, by writing struct.field
.
It is represented as an expression in noname:
#![allow(unused)] fn main() { pub enum ExprKind { // ... FieldAccess { lhs: Box<Expr>, rhs: Ident } }
The reason why the left-hand side is also an expression, as opposed to just a variable pointing to a struct, is that we need to support code like this:
#![allow(unused)] fn main() { let a = b.c.d; let e = f[3].g; }
Note that there are other usecases that are not allowed at the moment for readability reasons. For example we could have allowed the following to be valid noname:
#![allow(unused)] fn main() { let res = some_function().some_field; }
but instead we require developers to write their logic in the following way:
#![allow(unused)] fn main() { let temp = some_function(); let res = temp.some_field; }
In the example
#![allow(unused)] fn main() { let a = b.c.d; }
the expression node representing the right hand side could be seen as:
#![allow(unused)] fn main() { ExprKind::FieldAccess { lhs: Expr { kind: ExprKind::FieldAccess { // x.y lhs: Expr { kind: ExprKind::Variable { name: "x" } }, rhs: Ident { value: "y" }, }, rhs: Ident { value: "z" }, /// [x.y].z } }
Assignments
Imagine that we want to mutate a variable. For example:
#![allow(unused)] fn main() { x.y.z = 42; x[4] = 25; }
At some point the circuit-writer would have to go through an expression node looking like this:
#![allow(unused)] fn main() { ExprKind::Assignment { lhs: /* the left hand side as an Expr */, rhs: Expr { kind: ExprKind::BigInt { value: 42 } }, } }
At this point, the problem is that to go through each expression node, we use the following API, which only gets us a Var
:
#![allow(unused)] fn main() { fn compute_expr( &mut self, global_env: &GlobalEnv, fn_env: &mut FnEnv, expr: &Expr, ) -> Result<Option<Var>> { }
So parsing the x.y
node would return a variable that either represents x
or represents y
.
The parent call would then use the result to produce x.y.z
with a similar outcome.
Then, we would either have x
or z
(depending on the strategy we chose) when we reach the assignment expression node.
Not leaving us enough information to modify the variables of x
in our local function environment.
What we really need when we reach the assignment node is the following:
- the name of the variable being modified (in both cases
x
) - if the variable is mutable or not (it was defined with the
mut
keyword) - the range of circuit variables in the
Var.cvars
ofx
, that thex.y.z
field access, or thex[42]
array access, represents.
VarOrRef
Overview
The VarOrRef enum is used to represent either a variable or a reference to a variable within expressions. Here is a concise overview:
pub enum VarOrRef<B: Backend> {
/// A variable.
Var(Var<B::Field, B::Var>),
/// A reference to a variable, potentially narrowing down to a range within it.
Ref {
var_name: String,
start: usize,
len: usize,
},
}
Var
: Represents a complete variable in the environment.
Ref
: Represents a reference to a variable, including:
var_name
: The name of the variable.start
: The starting index of the slice or field.len
: The length of the slice or field.
Every expression node in the AST is resolved as a VarOrRef
, an enum that represents either a variable, or a reference to a variable. The sole reason to use a reference is when the variable is mutable, in which case you must be able to go to the list of variables present in the scope and mutate the correct one (so that if some logic tries to mutate it, it can). That’s why, a var_name
is stored in a reference. We also pass a (start, len)
tuple to handle mutable slices. As we need to remember exactly where we are in the original array. As a slice is a narrowing of an array, we must not lose track of which array we were looking at originally (as this is what needs to be mutated). This ensures accurate modification of the variable’s state, maintaining the integrity of the mutable references.
Circuit writer
To implement this in the circuit writer, we follow a common practice of tracking references:
#![allow(unused)] fn main() { /// Represents a variable in the circuit, or a reference to one. /// Note that mutable variables are always passed as references, /// as one needs to have access to the variable name to be able to reassign it in the environment. pub enum VarOrRef { /// A [Var]. Var(Var), /// A reference to a noname variable in the environment. /// Potentially narrowing it down to a range of cells in that variable. /// For example, `x[2]` would be represented with "x" and the range `(2, 1)`, /// if `x` is an array of `Field` elements. Ref { var_name: String, start: usize, len: usize, }, } }
and we modify the circuit-writer to always return a VarOrRef
when processing an expression node in the AST.
While the type checker already checks if the lhs
variable is mutable when it encounters an assignment expression,
the circuit-writer should do its best to pass references only when a variable is known to be mutable.
This way, if there is a bug in the type checker, this will turn unsafe code into a runtime error.
An array access, or a field access in a struct, is processed as a narrowing of the range we’re referencing in the original variable:
#![allow(unused)] fn main() { impl VarOrRef { fn narrow(&self, start: usize, len: usize) -> Self { match self { VarOrRef::Var(var) => { let cvars = var.range(start, len).to_vec(); VarOrRef::Var(Var::new(cvars, var.span)) } // old_start // | // v // |----[-----------]-----| <-- var.cvars // <---------> // old_len // // // start // | // v // |---[-----]-| // <---> // len // VarOrRef::Ref { var_name, start: old_start, len: old_len, } => { // ensure that the new range is contained in the older range assert!(start < *old_len); // lower bound assert!(start + len < *old_len); // upper bound assert!(len > 0); // empty range not allowed Self::Ref { var_name: var_name.clone(), start: old_start + start, len, } } } } }
Type checker
While the type checker does not care about the range within a variable, it also needs to figure out if a variable is mutable or not.
That information is in two places:
- it is stored under the variable’s name in the local environment
- it is also known when we look up a variable, and we can thus bubble it up to the parent expression nodes
Implementing solution 1. means bubbling up the variable name, in addition to the type, associated to an expression node.
Implementing solution 2. means bubbling up the mutability instead.
As it is possible that we might want to retrieve additional information in the future, we chose to implement solution 1. and carry the variable name in addition to type information when parsing the AST in the type checker.
Modules
In noname, the concept of a module is basically a file. A project either is a binary (main.no
) or a library (lib.no
). That’s it.
A binary or a library can use other libraries by importing them. To do that, a binary or library’s manifest file Noname.toml
must contain a dependencies
key listing all the other libraries as Github handles like user/repo
(e.g. mimoo/sudoku
).
Libraries will then be retrieved from Github.
Currently there is no versioning. Not because it’s not important, but because I haven’t had the time to implement it.
Each library can be imported in code with the following command:
use module::lib;
For example, currently you automatically have access to the std
module:
use std::crypto;
fn main(pub digest: [Field; 2]) {
let expected_digest = crypto::poseidon([1, 2]);
assert_eq(expected_digest, digest);
}
Each library is seen as a module, and different modules might have the same name:
use a::some_lib;
use b::some_lib;
There is currently no solution to this problem.
This is a problem that does not exist in Rust, as there’s a single namespace that everyone shares, but that exists in Golang.
The current proposed solution is to introduce an as
keyword, like in Rust, to be able to alias imports (e.g. use a::some_lib as a_some_lib;
).
Dependency graph and type checking
During building, a dependency graph of all dependencies is formed (and dependencies are retrieved from Github at the same time). This must be done to detect dependency cyles.
Once this is done, a list of dependencies from leaves to roots is computed, and each dependency is analyzed in this order. Dependencies are not compiled! As the circuit-writer is not ran. Things stop at the type checker. For every new dependency analyzed, all TAST (typed AST) previously computed on previous dependencies are passed as argument. This way, if a dependency A uses a dependency B, it has access to the TAST of B to perform type checking correctly.
As such, it is important that a::some_lib
and b::some_lib
are seen as two independent modules.
For this reason, we store imported modules as their fully qualified path, in the set of TASTs that we pass to the type checker.
But in the current module, we store them as their alias, so that we can use them in the code.
TASTs: HashMap<a::some_lib, TAST>
TAST: contains <some_lib -> a::some_lib>
Compilation and circuit generation
Once type checking is done, the circuit writer is given access to all of the dependencies’ TAST (which also contain their AST). This way, it can jump from AST to AST to generate an unrolled circuit.
Another solution
This is a bit annoying. We need a context switcher in both the constraint writer and the type checker, and it’s almost the same code.
Type Checker
Constraint Writer
#![allow(unused)] fn main() { pub struct CircuitWriter { /// The type checker state for the main module. // Important: this field must not be used directly. // This is because, depending on the value of [current_module], // the type checker state might be this one, or one of the ones in [dependencies]. typed: TypeChecker, /// The type checker state and source for the dependencies. // TODO: perhaps merge {source, typed} in this type? dependencies: Dependencies, /// The current module. If not set, the main module. // Note: this can be an alias that came from a 3rd party library. // For example, a 3rd party library might have written `use a::b as c;`. // For this reason we must store this as a fully-qualified module. pub(crate) current_module: Option<UserRepo>, }
and then access to the TAST is gated so we can switch context on demand, or figure out what’s the current context:
#![allow(unused)] fn main() { impl CircuitWriter { /// Retrieves the type checker associated to the current module being parsed. /// It is possible, when we jump to third-party libraries' code, /// that we need access to their type checker state instead of the main module one. pub fn current_type_checker(&self) -> &TypeChecker { if let Some(current_module) = &self.current_module { self.dependencies .get_type_checker(current_module) .expect(&format!( "bug in the compiler: couldn't find current module: {:?}", current_module )) } else { &self.typed } } pub fn expr_type(&self, expr: &Expr) -> Option<&TyKind> { let curr_type_checker = self.current_type_checker(); curr_type_checker.node_types.get(&expr.node_id) } pub fn node_type(&self, node_id: usize) -> Option<&TyKind> { let curr_type_checker = self.current_type_checker(); curr_type_checker.node_types.get(&node_id) } pub fn struct_info(&self, name: &str) -> Option<&StructInfo> { let curr_type_checker = self.current_type_checker(); curr_type_checker.struct_info(name) } pub fn fn_info(&self, name: &str) -> Option<&FnInfo> { let curr_type_checker = self.current_type_checker(); curr_type_checker.functions.get(name) } pub fn size_of(&self, typ: &TyKind) -> Result<usize> { let curr_type_checker = self.current_type_checker(); curr_type_checker.size_of(&self.dependencies, typ) } pub fn resolve_module(&self, module: &Ident) -> Result<&UsePath> { let curr_type_checker = self.current_type_checker(); let res = curr_type_checker.modules.get(&module.value).ok_or_else(|| { self.error( ErrorKind::UndefinedModule(module.value.clone()), module.span, ) }); res } pub fn do_in_submodule<T, F>(&mut self, module: &Option<Ident>, mut closure: F) -> Result<T> where F: FnMut(&mut CircuitWriter) -> Result<T>, { if let Some(module) = module { let prev_current_module = self.current_module.clone(); let submodule = self.resolve_module(module)?; self.current_module = Some(submodule.into()); let res = closure(self); self.current_module = prev_current_module; res } else { closure(self) } } pub fn get_fn(&self, module: &Option<Ident>, fn_name: &Ident) -> Result<FnInfo> { if let Some(module) = module { // we may be parsing a function from a 3rd-party library // which might also come from another 3rd-party library let module = self.resolve_module(module)?; self.dependencies.get_fn(module, fn_name) // TODO: add source } else { let curr_type_checker = self.current_type_checker(); let fn_info = curr_type_checker .functions .get(&fn_name.value) .cloned() .ok_or_else(|| { self.error( ErrorKind::UndefinedFunction(fn_name.value.clone()), fn_name.span, ) })?; Ok(fn_info) } } pub fn get_struct(&self, module: &Option<Ident>, struct_name: &Ident) -> Result<StructInfo> { if let Some(module) = module { // we may be parsing a struct from a 3rd-party library // which might also come from another 3rd-party library let module = self.resolve_module(module)?; self.dependencies.get_struct(module, struct_name) // TODO: add source } else { let curr_type_checker = self.current_type_checker(); let struct_info = curr_type_checker .struct_info(&struct_name.value) .ok_or(self.error( ErrorKind::UndefinedStruct(struct_name.value.clone()), struct_name.span, ))? .clone(); Ok(struct_info) } } pub fn get_source(&self, module: &Option<UserRepo>) -> &str { if let Some(module) = module { &self .dependencies .get_type_checker(module) .expect(&format!( "type checker bug: can't find current module's (`{module:?}`) file" )) .src } else { &self.typed.src } } pub fn get_file(&self, module: &Option<UserRepo>) -> &str { if let Some(module) = module { &self.dependencies.get_file(module).expect(&format!( "type checker bug: can't find current module's (`{module:?}`) file" )) } else { &self.typed.filename } } pub fn get_current_source(&self) -> &str { self.get_source(&self.current_module) } pub fn get_current_file(&self) -> &str { self.get_file(&self.current_module) } pub fn add_local_var(&self, fn_env: &mut FnEnv, var_name: String, var_info: VarInfo) { // check for consts first let type_checker = self.current_type_checker(); if let Some(_cst_info) = type_checker.constants.get(&var_name) { panic!( "type checker bug: we already have a constant with the same name (`{var_name}`)!" ); } // fn_env.add_local_var(var_name, var_info) } pub fn get_local_var(&self, fn_env: &FnEnv, var_name: &str) -> VarInfo { // check for consts first let type_checker = self.current_type_checker(); if let Some(cst_info) = type_checker.constants.get(var_name) { let var = Var::new_constant(cst_info.value, cst_info.typ.span); return VarInfo::new(var, false, Some(TyKind::Field)); } // then check for local variables fn_env.get_local_var(var_name) } }
we basically have to implement the same in the type checker… It always sort of looks the same. A handy function is either called with get_fn
or expr_type
or node_type
etc. or we call a block of code with do_in_submodule
.
all of these basically start by figuring out the curr_type_checker
:
- what’s the current module (
self.current_module
)?- if there is none, use the main TAST (
self.typed
) - otherwise find that TAST (in
self.dependencies
) - btw all of this logic is implemented in
self.current_type_checker()
- the returned TAST is called
curr_type_checker
- if there is none, use the main TAST (
then, if we’re handling something that has a module:
- do name resolution (implemented in
resolve_module()
):- use
curr_type_checker
to resolve the fully-qualified module name
- use
or if we’re executing a block within a module:
- save the current module (
self.current_module
) - replace it with the module we’re using (we have used
resolve_module()
at this point) - execute in the closure where
self
is passed - when we return, reset the current module to its previous saved state
note that when we return an error, we always try to figure out which file it came from, which can be resolved via self.current_module
.
Name resolution approach
If we have a name resolution phase, we could do this:
- fully qualify all things that need to be fully qualified: structs, functions, methods (which are defined as function currently, should we not do that?), consts. And that’s it?
- create a
Hashmap<usize, String>
to store all the filenames - add the
usize
in allSpan
CellVar
A CellVar
type is a type that represents an internal variable. Importantly, it is named after the fact that it relates to a specific cell, or even multiple cells if they will have the same value (using some wiring), in the execution trace.
A CellVar
looks like this:
#![allow(unused)] fn main() { pub struct CellVar { index: usize, span: Span, } }
It is tracked using a usize
, which is just a counter that the compiler increments every time a new CellVar
is created.
A CellVar
is created via the new_internal_var
function which does two things: increments the variable counter, and stores some information on how to compute it (which will be useful during witness generation)
#![allow(unused)] fn main() { pub fn new_internal_var(&mut self, val: Value, span: Span) -> CellVar { // create new var let var = CellVar::new(self.next_variable, span); self.next_variable += 1; // store it in the compiler self.vars_to_value.insert(var, val); var } }
a Value
tells us how to compute the CellVar
during witness generation:
#![allow(unused)] fn main() { pub enum Value { /// Either it's a hint and can be computed from the outside. Hint(Box<dyn Fn(&Compiler, &mut WitnessEnv) -> Result<Field>>), /// Or it's a constant (for example, I wrote `2` in the code). Constant(Field), /// Or it's a linear combination of internal circuit variables (+ a constant). LinearCombination(Vec<(Field, CellVar)>, Field), /// A public or private input to the function /// There's an index associated to a variable name, as the variable could be composed of several field elements. External(String, usize), /// A public output. /// This is tracked separately as public inputs as it needs to be computed later. PublicOutput(Option<CellVar>), } }
Note: a CellVar
is potentially not directly added to the rows of the execution trace.
For example, a private input is converted directly to a (number of) CellVar
(s),
but only added to the rows when it appears in a constraint for the first time.
As the final step of the compilation, we double check that all CellVar
s have appeared in the rows of the execution trace at some point. If they haven’t, it can mean two things:
- A private or public input was never used in the circuit. In this case we return an error to the user.
- There is a bug in the compiler. In this case we panic.
TODO: explain the LinearCombination. I think we only need an Add((Field, Var), (Field, Var), Field)
Vars
We already have CellVar
s, why have Var
s?
The distinction is a matter of abstraction:
CellVar
s are low-level: they track actual cells of the execution trace. When a singleCellVar
is assigned to multiple cells the wiring must make sure the cells are wired (so that they can only have the same value).Var
s are a higher-level concept: they track variables that are created in the noname language either directly (e.g.let x = 3
) or indirectly (e.g. inx + (y + z)
the termy + z
is stored under an anonymousVar
)
While a CellVar
represents a single field element, a Var
can potentially represent several field elements (and as such several cells in the execution trace).
Here are some examples of Var
s:
#![allow(unused)] fn main() { // a constant let x = 5; // a field element that will be computed at runtime let y = private_input + 1; // a builtin type, like an array, or a bool let z = [y, x, 6]; // or a custom type let s = Thing { x, y }; }
Internally, a Var
is represented as such:
#![allow(unused)] fn main() { /// A constant value created in a noname program pub struct Constant { /// The actual value. pub value: Field, /// The span that created the constant. pub span: Span, } /// Represents a cell in the execution trace. pub enum ConstOrCell { /// A constant value. Const(Constant), /// A cell in the execution trace. Cell(CellVar), } /// A variable in a program can have different shapes. pub enum VarKind { /// We pack [Const] and [CellVar] in the same enum because we often branch on these. ConstOrCell(ConstOrCell), /// A struct is represented as a mapping between field names and other [VarKind]s. Struct(HashMap<String, VarKind>), /// An array or a tuple is represetend as a list of other [VarKind]s. ArrayOrTuple(Vec<VarKind>), } /// Represents a variable in the noname language, or an anonymous variable during computation of expressions. pub struct Var { /// The type of variable. pub kind: VarKind, /// The span that created the variable. pub span: Span, } }
Note: see the Constant chapter to see why constants are treated differently.
Anonymous variable
Here’s a short note on anonymous variable.
When circuit writer parses the ast, it will convert each expression into a Var
(unless the expression does not compute to an actual value).
In our example:
#![allow(unused)] fn main() { let x = t + (z + y); }
the z + y
is parsed as an expression (a binary operation involving z
and y
) and stored under a var var1
.
Then t + ...
is also parsed as another binary operation expression and stored under another var var2
.
Finally the let x = ...
is parsed as an assignment statement, and x
is stored as a local variable associated to the right handside var var2
.
See the Scope chapter for more information on local variables.
Constants
Developers write constants in their code all the time. For example, the following code has two constants 2
and 4
:
#![allow(unused)] fn main() { let x = 2 + y; assert_eq(x, 4); }
It is important that constants are tracked differently than CellVar
s for several reasons:
- It is sometimes useless to constrain them directly. For example, in
let x = 3 + 7;
you can see that we should not constrain3
and7
separately, but rather the result10
. - It is sometimes useless to constrain them at all. For example, boolean constants are never constrained because you never need to.
- They can be cached to avoid creating several constraints for the same constant.
Currently a constant appears in the circuit only when CircuitWriter::add_constant
is called.
It uses the generic gate to constrain the value, and is not cached (so calling it several times with the same constant will create multiple constraints):
#![allow(unused)] fn main() { pub fn add_constant(&mut self, value: Field, span: Span) -> CellVar { let var = self.new_internal_var(Value::Constant(value), span); let zero = Field::zero(); self.add_gate( GateKind::DoubleGeneric, vec![Some(var)], vec![Field::one(), zero, zero, zero, value.neg()], span, ); var } }
Note that the Value
keep track of the constant as well.
Warning: gadgets must all handle constants gracefully.
That is, they must constrain constants themselves (by calling CircuitWriter::add_constant
).
Global environment
In this chapter we will talk about functions.
Local functions
Third-party libraries can have function names that collide with your own function names. Due to this, they are tracked in a different data structure that we will see later.
Local functions include:
- automatically imported built-ins. Think functions like
assert
andassert_eq
. See here for a full list. - main, this is the main function that your program runs. Of course if you’re writing a library this function is not present.
- normal functions, these are functions that you define in your program. They can be recursive.
- methods, these are functions that are defined on a type. They can be recursive as well.
Built-ins are different from all other functions listed because they are not written in noname, but written in Rust within the compiler.
For this reason we track functions according to this enum:
#![allow(unused)] fn main() { pub enum FnKind { /// signature of the function BuiltIn(FnHandle), /// Any function declared in the noname program (including main) LocalFn(AST) /// path, and signature of the function Library(Vec<String>), } /// An actual handle to the internal function to call to resolve a built-in function call. pub type FnHandle = fn(&mut CircuitWriter, &[Var], Span) -> Result<Option<Var>>; pub struct FnInfo { pub name: Ident, pub sig: FnSig, pub kind: FnKind, pub span: Span, } }
Note that the signature of a FnHandle
is designed to:
&mut CircuitWriter
: take a mutable reference to the circuit writer, this is because built-ins need to be able to register new variables and add gates to the circuit&[Var]
: take an unbounded list of variables, this is because built-ins can take any number of arguments, and different built-ins might take different types of argumentsSpan
: take a span to return user-friendly errors-> Result<Option<Var>>
: return aResult
with anOption
of aVar
. This is because built-ins can return a variable, or they can return nothing. If they return nothing, then theOption
will beNone
. If they return a variable, then theOption
will beSome(Var)
.
We track all of these functions in the following structure:
#![allow(unused)] fn main() { pub struct GlobalEnv { /// the functions present in the scope /// contains at least the set of builtin functions (like assert_eq) functions: HashMap<String, FnInfo>, // ... } }
Handling builtins
Builtins are handled in a special way. They are not written in noname, but in Rust.
Handling local functions
The parser:
- saves the AST of each function it encounters. Specifically, the function’s AST is stored under the
GlobalEnv
(TODO: where exactly?). This is necessary as the circuit writer will have to switch to a function’s AST when a function is called (and then return to its own AST).
The first step of the type checker resolves imports by doing the following:
- store all built-ins in the
functions
map of theGlobalEnv
- resolve all imports (e.g.
use std::crypto
) - type check each function individually, and save their signature in the
GlobalEnv
using theFnSig
type - type check function calls with the signatures they just saved
(TODO: this means that function declaration must be ordered. I think it is a GOOD thing)
When a function is called, we do the following:
- if the function is qualified (e.g.
crypto::poseidon
), then lookup imported modules (see next section) - otherwise, check if the function exist in the
GlobalEnv
, if it doesn’t then return an error - if the function exist, then create a new
FnEnv
and register the arguments as local variables there - switch to the function’s AST and pass the new
FnEnv
as argument - TODO: how to handle the return value? it should be saved in the
FnEnv
Third-party libraries
TODO: write this part
#![allow(unused)] fn main() { /// This seems to be used by both the type checker and the AST // TODO: right now there's only one scope, but if we want to deal with multiple scopes then we'll need to make sure child scopes have access to parent scope, shadowing, etc. #[derive(Default, Debug)] pub struct GlobalEnv { /// the functions present in the scope /// contains at least the set of builtin functions (like assert_eq) pub functions: HashMap<String, FuncInScope>, /// stores the imported modules pub modules: HashMap<String, ImportedModule>, /// the arguments expected by main pub main_args: (HashMap<String, FuncArg>, Span), } pub type FnHandle = fn(&mut CircuitWriter, &[Var], Span) -> Result<Option<Var>>; pub enum FuncInScope { /// signature of the function BuiltIn(FnSig, FnHandle), /// path, and signature of the function Library(Vec<String>, FnSig), } }
As part of resolving imports, the type checker looks at third-party libraries differently…
TODO: implement this
TODO: how to handle diamond dependency graph or cycles? We must form a dependency graph first, and resolve dependency according to this graph
Scope
Like most languages, noname has a notion of scope within a function. Unlike a lot of languages noname forbids shadowing at all scope level. This means that eventhough different functions can use local variable with colliding names, the local variable of one function must all have different names.
For example, the following code does not compile:
#![allow(unused)] fn main() { let x = 2; let x = 3; // this won't compile let y = 4; for i in 0..4 { let y = i; // this won't compile either } }
Scopes are only used for:
- for loops
- in the future: if/else statements
Scope mechanisms
Both the type checker and the circuit writer need to keep track of local variable.
For the type checker (type_checker.rs
), a TypeEnv
structure keeps track of the association between all local variables names and their type information.
For the circuit writer (circuit_writer.rs
), a FnEnv
structure keeps track of the association between all local variable names and their circuit variable.
Both structure also keep track of how nested the current block is (the top level starting at level 0). For this reason, it is important to remember to increase the current scope when entering a new block (for loop, if statement, etc.) and to decrease it when exiting the block. In addition, all variables from a scope must be disabled (but not deleted, in order to detect shadowing) when exiting that scope.
For example, the type checker’s TypeEnv
structure implements the following logic:
#![allow(unused)] fn main() { impl TypeEnv { // ... /// Enters a scoped block. pub fn nest(&mut self) { self.current_scope += 1; } /// Exits a scoped block. pub fn pop(&mut self) { self.current_scope.checked_sub(1).expect("scope bug"); // disable variables as we exit the scope for (name, (scope, type_info)) in self.vars.iter_mut() { if *scope > self.current_scope { type_info.disabled = true; } } } /// Returns true if a scope is a prefix of our scope. pub fn is_in_scope(&self, prefix_scope: usize) -> bool { self.current_scope >= prefix_scope } /// Stores type information about a local variable. /// Note that we forbid shadowing at all scopes. pub fn store_type(&mut self, ident: String, type_info: TypeInfo) -> Result<()> { match self .vars .insert(ident.clone(), (self.current_scope, type_info.clone())) { Some(_) => Err(Error::new( ErrorKind::DuplicateDefinition(ident), type_info.span, )), None => Ok(()), } } /// Retrieves type information on a variable, given a name. /// If the variable is not in scope, return false. pub fn get_type_info(&self, ident: &str) -> Option<TypeInfo> { if let Some((scope, type_info)) = self.vars.get(ident) { if self.is_in_scope(*scope) && !type_info.disabled { Some(type_info.clone()) } else { None } } else { None } } } }
Public Outputs
Public outputs are usually part of the public inputs in Plonk.
In noname, public outputs are treated differently than the public inputs for one reason: unlike (real) public inputs they cannot be computed directly during witness generation (proving).
This is because public inputs are listed first in the circuit. During witness generation, we go through each rows and evaluate the values of the cells to construct the execution trace. When we reach the public output part of the public input, we do not yet have enough information to construct the values. Thus, we ignore them, and fill them later on.
During the compilation, we create CellVars
to keep track of the public output:
#![allow(unused)] fn main() { pub struct Compiler { // ... /// If a public output is set, this will be used to store its [CircuitVar] (cvar). /// The public output generation works as follows: /// 1. This cvar is created and inserted in the circuit (gates) during compilation of the public input /// (as the public output is the end of the public input) /// 2. When the `return` statement of the circuit is parsed, /// it will set this `public_output` variable again to the correct vars. /// 3. During witness generation, the public output computation /// is delayed until the very end. pub public_output: Option<CellVars>, } }
During witness generation (see the Witness Generation chapter), we indeed defer computation the first time we go through the public output rows:
#![allow(unused)] fn main() { let val = if let Some(var) = var { // if it's a public output, defer it's computation if matches!(self.vars_to_value[&var], Value::PublicOutput(_)) { public_outputs_vars.push((row, *var)); Field::zero() } else { self.compute_var(&mut env, *var)? } } else { Field::zero() }; witness_row[col] = val; }
and at the end we go back to them:
#![allow(unused)] fn main() { // compute public output at last let mut public_output = vec![]; for (row, var) in public_outputs_vars { let val = self.compute_var(&mut env, var)?; witness[row][0] = val; public_output.push(val); } }
and finally we return the public output to the prover so that they can send it to the verifier, as well as the “full public input” which is the concatenation of the public input and the public output (needed to finalized the proof):
#![allow(unused)] fn main() { // return the public output separately as well Ok((Witness(witness), full_public_inputs, public_output)) }
Witness Generation
Witness generation is the process of creating the execution trace table during proving. The execution trace table is then passed to the kimchi proof system which will create the final proof.
The code creates a series of instructions during compilation for the witness generation to follow. These instructions are stored as two different fields:
#![allow(unused)] fn main() { pub struct Compiler { // ... /// This is how you compute the value of each variable, for witness generation. pub vars_to_value: HashMap<CellVar, Value>, // ... /// This is used to compute the witness row by row. pub witness_table: Vec<Vec<Option<CellVar>>>, // ... } }
witness_table
can essentially be seen as the execution trace table, containing variables instead of values.
The witness generation goes as follows:
- Each rows in
witness_table
is looked at one by one - For each
CellVar
in the row:- If it is set, it is evaluated using the
Value
stored invars_to_value
. - If it set to
None
, it is simply evaluated as0
.
- If it is set, it is evaluated using the
- Once the row is created, it is checked for correctness by checking what gate was used in the row. Note that this is only true for the generic gate, as we trust built-in gadgets to produce correct values. For example,
assert(x, 2)
will be checked because it is using the generic gate, butlet y = poseidon(x)
won’t be because we trust the poseidon gate to be correct (and if there is a bug there, kimchi will still catch it).
RFC-0: Generic-sized arrays in function signatures
Summary
This RFC proposes to support const generics in noname. The generic parameters can be resolved from the observed arguments, such as constants, arrays, or structs. This improves reusability and modularity of the code. It is a prerequisite for supporting generic array with symbolic size.
Code Examples
Here is a few previews of how the generic parameters can be used, and what features it would unlock.
Allow functions to create array with symbolic size:
// the return type is determined by the generic arguments // so `init_arr` can be used to create arrays with different sizes fn init_arr(const LEN: Field) -> [Field; LEN] { let arr = [0; LEN]; return arr; } fn main() -> [Field; 3] { let arr = init_arr(3); return arr; }
Resolving the generic values from the observed array argument:
fn last(arr: [Field; LEN]) -> Field { // use generic parameter LEN to form dynamic expressions in the function scope return arr[LEN - 1]; } fn main() -> Field { let arr = [1, 2, 3, 4, 5]; // generic parameter LEN can be resolved from the array size of argument `arr` return last(arr); }
Builtin Examples
Given the following function signatures for builtin functions:
#![allow(unused)] fn main() { fn to_bits(const LEN: Field, val: Field) -> [Field; LEN] fn from_bits(bits: [Bool; LEN]) -> Field }
Calling the builtin functions in native code:
use std::bits; const num_bits = 8; fn main() { let val1 = 101; // `num_bits` will be assigned to the generic parameter `LEN` in the return type // then the type of `bits` will be monomorphized to [Bool; 8] let bits = bits::to_bits(num_bits, val); // the value of `LEN` can be determined from the size of `bits` during monomorphization // so the builtin function knows how many bits to convert let val2 = bits::from_bits(bits); assert_eq(val1, val2); }
The values for the generic parameters will be passed to the function via the generics
argument:
#![allow(unused)] fn main() { fn to_bits<B: Backend>( compiler: &mut CircuitWriter<B>, generics: &GenericParameters, vars: &[VarInfo<B::Field, B::Var>], span: Span, ) -> Result<Option<Var<B::Field, B::Var>>> { ... // retrieve the generic values from the `generics` argument let bitlen = generics.get("LEN") as usize; ... } }
Both the return type and returned vars can be checked outside of the builtin functions. The return type can be checked automatically in the same way as the native functions, the types of which are propagated and converged at certain point, at which error will be thrown if the types are not matched.
The return vars can be checked by relying on the types. Each concrete type has a fixed number of vars. With the resolved return type of the builtin function, we can check if the size is matched. Additionally, we can check the values recursively with the type structure, but it might only limited to checking the boolean type which got obvious bound 0 or 1. So automatically checking if the actual return from a builtin is an area to be improved in the future.
monomorphization
The resolving of the generic values can be done by observing the arguments passed to the function. Then it stores the resolved values in the relevant contexts for the following compiler pipeline to do type checking and circuit synthesizing. We call the process of resolving the generic values and type checking Monomorphization.
The current pipeline of compiling noname code is:
- Parse the code into AST
- Convert the AST into NAST with naming resolution
- Convert the NAST into TAST with type metadata collection and type checking
- Circuit synthesizing TAST into an constraint system
With generic parameters, the current TAST phase can’t handle the type checking anymore, because the generic parameters are unknown. For example, it can’t type check the array with symbolic size without resolving the values for the generic parameters.
To extend the type checking to support generic parameters, we can add a MAST phase (Monomorphized AST), right after the TAST phase, to resolve the values of the generic parameters. The circuit synthesizer will rely on the MAST instead of the TAST to compile a circuit.
Below is the diagram of the compiler pipeline with the MAST phase added:
Implementation
To support the generic syntax as shown in the examples above, we need to make changes to the AST Parser support generic syntax. Furthermore, because the generic parameters can’t be resolved at TAST phase, the some type checkings will be less strict and deferred to MAST phase.
Here is a list of cases where the type checks can’t be done at TAST phase, as they need to resolve the generic values:
- Inconsistent sizes of arrays with same generic parameters
fn gen(const LEN: Field) -> [Field; LEN] { return [0; LEN]; } // expect same generic size fn comp(arr1: [Field; LEN], arr2: [Field; LEN]) { for ii in 0..LEN { assert_eq(arr1[ii], arr2[ii]); } } fn main(pub xx: Field) { let arr1 = gen(2); let arr2 = gen(3); // arrays with different sizes comp(arr1, arr2); }
- Assignment type mismatched
fn gen(const LEN: Field) -> [Field; LEN] { return [0; LEN]; } fn main(pub xx: Field) { let mut arr = [0; 3]; // arrays with different sizes arr = gen(2); }
- Custom type field mismatched
struct Thing { xx: [Field; 2], } fn gen(const LEN: Field) -> [Field; LEN] { return [0; LEN]; } fn main(pub xx: Field) { let arr = gen(3); // array size mismatched let thing = Thing { xx: arr }; }
- Array index out of bounds
fn gen(const LEN: Field) -> [Field; LEN] { return [0; LEN]; } fn main(pub xx: Field) { let arr = gen(3); // 3 is out of bounds arr[3] = 1; }
The newly added phase MAST will be responsible for resolving the generic values from the observed arguments. It includes type checking on the monomorphized types that are bypass in the TAST phase.
Generic Syntax
This RFC proposes a simple generic syntax without the introduction of the common turbofish syntax, since we don’t need to resolve the generic parameters from the function arguments. Instead, the values of the generic parameters can be directly resolved by comparing values with the observed arguments.
For example, with the turbofish, we could do something like:
#![allow(unused)] fn main() { fn create_arr<N>(arr: [Field; N + 3]) -> [Field; N + 3] {...} }
This is a rare case where the generic parameter can’t be trivially resolved from the observed arguments. To get it work without any advanced inference setups, it would require manually passing the value of N
to the function via turbofish syntax, such as:
#![allow(unused)] fn main() { // a is then of type [Field, 6] let a = create_arr::<3>(arr); }
However, for most of the cases, the values for the generic parameters can be obtained simply by observing the arguments passed to the function. This RFC aims to keep the syntax simple and to be intuitive. Without the turbofish syntax, the generic syntax can be simplified like the following:
#![allow(unused)] fn main() { // the value of LEN equals to the argument passed in fn create_arr(const LEN: Field) -> [typ; LEN] // if the argument is array, then the value of LEN equals to the size of the array fn last(arr: [typ; LEN]) -> Field }
In the function scope, it might need to determine whether a variable is a generic parameter or not. We rules strings with at least 2 letters, which should be all capitalized, as generic parameters.
AST Parser
Parser will need to collect the generic identifiers for the following constructions FunctionDef
. It will add a new TyKind
, the GenericSizedArray(type, size)
. The size of GenericSizedArray
is represented by a Symbolic
value, which can contain generic parameters or concrete values.
We add generics
field to FunctionDef
, which is a set of GenericParameters
mapping between generic names and values.
#![allow(unused)] fn main() { enum Symbolic { Concrete(u32), // literal value Generic(GenericParameters), // generic parameters Constant(Ident), // pointing to a constant variable } GenericSizedArray { ty: TyKind, size: Symbolic, } }
Update FunctionDef
:
#![allow(unused)] fn main() { pub struct FunctionDef { ... // to add pub generics: HashSet<GenericIdentifier>, } }
Example for a function with a generic parameter:
#![allow(unused)] fn main() { fn create_arr(const LEN: Field) -> [Field; LEN] {...} }
The parser should create the function definition like pseudo code below:
#![allow(unused)] fn main() { FunctionDef{ FnSig = { ... generics = {"LEN": null}, FnArg = { name: 'LEN', } // Add / Mul / Generic / Concrete are variants of Symbolic enum return_type: GenericSizedArray(Field, Generic("LEN")) } } }
The TAST use these metadata of generic parameters for type checking the consistency of generic identifiers. In MAST phase, they will be useful for resolving the generic values from the observed arguments.
TAST
The generic values are resolved from the observed arguments. If the generic parameters are declared, they should be used in the function body. We need to check if the generic parameters declared make senses.
Type check generic parameters for functions
#![allow(unused)] fn main() { // shouldn't allow this, because the LEN should be defined in the function arguments fn foo(n: Field) -> [Field; LEN] {...} // not allowed if no use of NN in the body fn foo(const NN: Field) {...} fn foo(arr: [Field; NN]) {...} }
Restrictions over generic function in for-loop Mutable Variables as Generic Arguments: It’s prohibited to use mutable variables as generic arguments in generic function calls inside loops. The language doesn’t support loop unrolling, so using loop indices or mutable counters as generic parameters is invalid.
Invalid example:
#![allow(unused)] fn main() { fn fn_call(const LEN: Field) -> [Field; LEN] {...} ... for ii in 0..NN { fn_call(ii); // Error: 'ii' is mutable } ... let mut jj = 0; for ii in 0..NN { fn_call(jj); // Error: 'jj' is mutable jj = jj + 1; } }
Allowed Usage with Constants: You can use constant values or immutable variables as generic arguments within loops.
Valid example:
#![allow(unused)] fn main() { let kk = 0; for ii in 0..NN { fn_call(kk); // Allowed: 'kk' is constant } }
Exception for Arrays: Mutable array variables can be used as generic arguments because their sizes are fixed at declaration, even if their contents change.
For example:
#![allow(unused)] fn main() { fn fn_call_arr(const arr: [Field; LEN]) -> [Field; LEN] {...} ... let mut arr = [0; 3]; for ii in 0..NN { fn_call_arr(arr); // Allowed: array size is fixed } }
Forbid operations on symbolic value of arguments
#![allow(unused)] fn main() { // disallow arithmetic operations on the symbolic value of the function arguments, // such as NN * 2 in this case. // because it is challenging to resolve the value of NN. fn last(arr: [Field; NN * 2]) -> Field { return arr[NN - 1]; } }
Defer type checks Anything involves the generic parameters should be deferred to MAST phase. We need to defer the type checks for array with generic size.
In MAST phase, the values of generic parameters can be resolved, so the symbolic values can be evaluated. Thus, all the types with generic parameters can be type checked, as the array sizes become concrete values.
MAST
After the TAST phase, the MAST phase can resolve the generic values from the observed arguments by propagate the constant values through the main function AST.
Resolving algorithm
The algorithm will need to handle the following two categories:
- Resolving from constant argument
- resolving from array argument
Example of resolving constant arguments:
#![allow(unused)] fn main() { // constant generic // - the value of LEN can be resolved from an observed constant value propagated // - store the value in the function body scope fn gen_arr(const LEN: Field) -> [Field; LEN] { let arr = [Field; LEN]; return arr; } }
Example of resolving from array arguments:
#![allow(unused)] fn main() { // First, here is a simple case. // - the LEN can be resolved from the array size of argument `arr` // - store the value of N in the context fn last(arr: [Field; LEN]) -> Field { return arr[LEN - 1]; } }
#![allow(unused)] fn main() { // Then, here is a more challenging case that would require techniques like SMT to solve the unique values of the generic parameters. // Even LEN * 2 looks obvious to solve, solving it may need something like (computer algebra system) CAS in rust. // It is easy to evaluate the Symbolic values using the solved generic value. But the way around is difficult. fn last(arr: [Field; LEN * 2]) -> Field { return arr[LEN - 1]; } }
In this RFC, we want to just enforce the syntax to be sufficient to support the simple cases. That is to disallow arithmetic operations among generic parameters for function arguments.
To recap, here is the pseudo code for the resolving algorithm for function:
- Collect the observed arguments
- For each argument, resolve the generic values from the observed arguments
- For each statement, compute the type with the resolved generic values
- Type check the resolved type being returned with the expected return type
Function Call Instantiation
The functions are defined as FunctionDef
, which is an AST containing the signature and the body of the function. The body is a vector of statements, each of which is a tree of expression nodes. It is fine to have different function calls pointing to these functions’ original AST, when the content of these functions doesn’t change, and so are the expression nodes.
However, when a function takes generic arguments, the actual arguments can result in different expression nodes. For example:
fn last(arr: [Field; LEN]) -> Field { return arr[LEN - 1]; } fn main() { let arr1 = [1, 2, 3, 4, 5]; let arr2 = [6, 7, 8, 9]; let last1 = last(arr1); // with LEN = 5 let last2 = last(arr2); // with LEN = 4 }
The monomorphized body of the function call for last(arr1)
is return arr[5 - 1]
, while the one for last(arr2)
is return arr[4 - 1]
. Therefore, we can’t have a single expression node representing both arr[5 - 1]
and arr[4 - 1]
expression nodes. These functions should be instantiated with new ASTs, which are monomorphized from the original ASTs. They will be regenerated with the generic parameters being resolved with concrete values. The two calls to last
should point to two different monomorphized function instances.
To ensure no conflicts in the node IDs being regenerated for these instantiated functions, the AST for the main function as an entry point will be regenerated. The monomorphized AST preserves the span information to point to the noname source code for the existing debugging feature.
Same as before, these instantiated functions can be pointed by the expression nodes ExprKind::FnCall
. With the support of generic parameters, we need to change the way of loading the function AST, as the current fully qualified name pattern doesn’t contain the information to differentiate the instantiated functions with different generic values.
Thus we can generate the monomorphized function name, and use it to store the monomorphized function AST instead of the original function name. The new string pattern to store the monomorphized function AST can be:
fn_full_qualified_name#generic1=value1#generic2=value2
Type checking The instantiation of a generic function will resolve the generic types to be concrete types. Similar to the TAST phase, during the monomorphization of a function body, the computed concrete type can be propagated and compared with the return type of the function signature.
The type check in this phase will always be in concrete type. Any unresolved generic type will fail the type check.
Monomorphization Process
Here is an overview of the monomorphization process:
-
Propagates types in the same way the type checker was doing but also with constant values, which will be used to resolve the generic parameters.
-
Along the way, it also propagates the AST nodes. When it is not part of a generic function, the node should remain the same. Otherwise, the node should be regenerated.
-
Whenever it encounters a generic function call, it instantiates the function based on the arguments and store it as a new function with a monomorphized name, then walks through the instantiated function AST. The function call AST node will be modified with the monomorphized name while retaining the same node id and span.
-
In the function instantiation process, all the AST nodes will be regenerated. This new AST will be stored under the monomorphized function name.
-
After monomorphized a function, it should add the name of the original function to a list that records which function AST to delete at the end. We can’t not delete the original function AST immediately, because it might be called at different places.
-
In each function block scope, it should type check the return types, by comparing the propagated return type and the defined return type. All these types should be in concrete form without generic parameters involved.
-
At the end, it overrides the main function AST with the monomorphized version, and delete generic functions based on the list.
All the updates are done to the existing stores in the TAST.
- Instantiated functions are added to
HashMap<FullyQualified, FnInfo<B>>
. - Types for Monomorphized nodes are stored in
HashMap<usize, TyKind>
(Keep in mind the store for node typesHashMap<usize, TyKind>
can also contain the node types for the deleted functions. We may need to figure out a way to safely delete the node types belonging to a deleted function) - Generic functions should be delete from
HashMap<FullyQualified, FnInfo<B>>
Circuit Synthesizer
Circuit synthesizer will rely on the monomorphized AST to compile the circuit. To synthesizer, the workflow will be the same as before, but with the monomorphized AST. It doesn’t need to be aware of the newly added support related to generics. The added MAST phase simplifies what needs to be done in the circuit synthesizer to support the generic features, in comparison to the alternative approach described in the following section.
Alternative approach
One alternative approach to the monomorphization described above is to propagate the generic values directly in circuit writer, without the need to add the MAST phase.
The circuit writer walks through the original AST via the compile_expr
function. This function propagate the values from the main function argument and constants and compute the VarOrRef
as an result. The VarOrRef
doesn’t return the struture of the types being computed.
In the process, when it needs to determine the structure of the type behind an expression node, it relies on the size_of
function to determine the number of vars representing the type. The size_of
relies on the node id of an expression to look up the type. This is not a problem when the types are concrete.
When the type behind an expression node is generic, the way of looking up the size of a type via size_of
is not applicable anymore, since the expression node can be of a generic type.
To solve this problem, there should be a new way to determine the size of a type for an expression node without relyin on the node id. One way, described ComputedExpr
, is to retain the structure of the type through the propagation in compute_expr
. Instead of passing around the VarOrRef
, the compute_expr
returns ComputedExpr
which contains both the structure of the type and the underlying variables VarOrRef
.
For example, when it is computing for the ExprKind::ArrayAccess
, it can use the ComputedExpr
of the array
expression node to determine the size of the array, so as to do some bound checks for access index.
This approach would require a significant refactor of the circuit writer’s compilation process. It would require changes to the assumptions from using VarOrRef
to structured ComputedExpr
. It would also need to rely on ComputedExpr
to do some addtional checks instead of just relying on types. This would require quite a number of additional assumptions between the ComputedExpr
, the actual types and generic parameters.
Therefore, we thought the monomorphization approach is more straightforward and easier to maintain in a long run, considering the pipeline of the compiler.