1/1/2026

Inside Ferret: Compiler Architecture and Design

A deep dive into Ferret's multi-phase compiler architecture, from source code to native binary.

architecturecompilertechnical

Ferret’s compiler is built from the ground up with a focus on correctness, performance, and maintainability. This post explores the architecture that powers Ferret’s compilation from .fer source files to native executables.

Design Principles

The Ferret compiler follows several key principles:

Per-Module Phase Tracking: Each source file progresses through compilation phases independently
Parallel Processing: Modules are parsed in parallel with unbounded goroutines
Thread-Safe Diagnostics: Error collection is thread-safe with proper mutex synchronization
Topological Ordering: Phases respect module dependencies for correct compilation
Multi-IR Architecture: Three intermediate representations (AST → HIR → MIR) for progressive lowering

Architecture Overview

Here’s the complete architecture diagram showing all compilation phases and data flow:

Phase-by-Phase Breakdown

Phase 1: Lexing & Parsing (Parallel)

The compilation journey begins with parallel parsing:

Regex-based Lexer: Tokenizes source with error continuation (doesn’t panic on bad input)
Recursive Descent Parser: Builds AST with minimal error recovery
Import Extraction: Discovers dependencies recursively
Parallel Execution: Each module spawns its own goroutine
Deduplication: Modules are parsed once per import path using sync.Map

Key Innovation: Unbounded goroutines work well for typical project sizes (< 100s of modules) and avoid the complexity of worker pools.

Phase 2: Symbol Collection

Building the symbol table hierarchy:

Lexical Scoping: Module → Function → Block scope chain
Name Declaration: Records all symbols (variables, functions, types)
Import Validation: Prevents redeclaration conflicts with import aliases
Method Validation: Ensures methods are defined in the same module as their types

Phase 3: Resolution

Binding names to their declarations:

Scope Walking: Parent chain traversal for symbol lookup
Qualified Access: Resolves module::symbol syntax
Cross-Module: Links symbols across file boundaries
Built-in Lookup: Resolves references to universe scope (built-in types)

Phase 4: Type Checking

The most complex phase with multiple subsystems:

Type Inference

Context-based inference from expressions
Array type adoption from surrounding context
Untyped literal resolution

Constant Evaluation

Arbitrary Precision: Uses big.Int and big.Float
Assignment Tracking: Follows constant values through local variables
Conservative: Only tracks primitives, not struct fields or complex expressions
Sound: Never gives wrong results, may miss some optimizations

Array Bounds Checking

Compile-time validation for fixed-size arrays with constant indices
Supports negative indexing (-1 to -N)
Dynamic arrays ([]T) auto-grow without bounds checking

Dead Code Detection

Detects unreachable code from constant conditions
Warns about obvious infinite loops (no break/return)
Works with constant expressions

Phase 5: HIR Generation

Lowering typed AST to High-Level IR:

Source-Shaped: Preserves original structure for error reporting
Type Annotations: All expressions have resolved types
Expression Lowering: Simplifies complex expressions
Function/Method Preservation: Maintains declaration structure

Phase 6: Control Flow Analysis

Building and analyzing control flow graphs:

BasicBlock Construction: Breaks functions into basic blocks
Successors/Predecessors: Tracks control flow edges
Return Path Validation: Ensures all paths return a value
Unreachable Detection: Finds dead code blocks
Loop Analysis: Tracks break/continue for loop contexts

Phase 7: HIR Lowering

Canonical transformation to prepare for MIR:

Optional/Result Explicit: Makes implicit operations explicit
Temp Variables: Splits complex expressions
Desugaring: Simplifies language constructs
Ready for MIR: Canonical form suitable for SSA

Phase 8: MIR Generation

Lowering to Mid-Level IR (SSA-like):

SSA Form: Single Static Assignment with value/block IDs
Basic Blocks: Structured control flow
VTables: Interface dispatch tables
Type IDs: Runtime type information
Closure Lowering: Environment capture with upvalues

Phase 9: QBE Code Generation

Native code generation via QBE:

QBE IR: SSA form with type annotations
Register Allocation: Handled by QBE
Optimization: QBE performs backend optimizations
Multi-Architecture: x86_64, aarch64, riscv64 support
Assembly Output: Platform-specific .s files

Assembly & Linking

Final binary generation:

Platform Assembler: Uses system as or $AS / $FERRET_AS
Object Files: ELF/Mach-O/COFF format
Runtime Library: libferret_runtime.a with memory management, I/O, panic handlers
CRT Objects: C runtime initialization
Static Linking: Single native binary output

Concurrency Model

Thread-safe compilation with proper lock hierarchy:

CompilerContext.mu (RWMutex): Protects module registry and dependency graph
Module.Mu (Mutex): Protects individual module field updates
DiagnosticBag.mu (Mutex): Thread-safe error collection
SourceCache.mu (RWMutex): Cached source for diagnostics

Lock Ordering: Always acquire locks in this order to prevent deadlocks.

Error Diagnostics

Rust-style error output with:

Primary/Secondary Labels: Multi-location error context
Color-Coded Severity: Error, Warning, Info, Hint
Source Context: Shows relevant source lines
Error Codes: T-prefix for type errors, W-prefix for warnings
Help Messages: Actionable suggestions for fixes

Example:

error[T0009]: array index out of bounds
  --> test.fer:5:9
   |
 5 |     arr[15]
   |         ~~ index 15 out of bounds for array of size 10
   |
   = note: array was defined with size 10
   = help: valid indices are 0 to 9, or -1 to -10 for reverse indexing

Design Decisions

Why Three IRs?

AST: Preserves source structure for error reporting
HIR: High-level IR maintains readability for optimization passes
MIR: Low-level SSA suitable for code generation

Why QBE?

Mature Backend: Production-ready register allocation and optimization
Multi-Architecture: Single IR for x86_64, aarch64, riscv64
Simple Integration: Clean C API and straightforward IR format
Fast Compilation: No LLVM overhead

Why Go for the Compiler?

Concurrency: Built-in goroutines and channels for parallel compilation
Memory Safety: GC eliminates memory bugs in compiler code
Fast Compilation: Go compiles quickly, keeping development iteration fast
Standard Library: Excellent regex, parsing, and file I/O support

Why Per-Module Phases?

Incremental Compilation: Only recompile changed modules (future)
Parallel Processing: Modules can progress independently
Correct Dependencies: Topological ordering ensures correctness
Cache Friendly: Each module’s artifacts can be cached

Performance Characteristics

Parallel Parsing: Scales with available CPU cores
Lock-Free Reads: RWMutex allows concurrent phase queries
Minimal Allocations: Reuses symbol tables and scopes where possible
Fast Type Checking: Constant evaluation uses big integers but is still fast
Zero-Copy: Import paths used as map keys without string copying

Future Enhancements

Incremental Compilation: Cache module artifacts between builds
Language Server: IDE support with incremental reparsing
Better Optimizations: More sophisticated MIR transformations
LLVM Backend: Alternative to QBE for maximum optimization
Build System Integration: First-class build tool support

Conclusion

Ferret’s architecture balances simplicity with sophistication. The multi-phase pipeline with proper concurrency control enables fast, correct compilation while maintaining clear separation of concerns. The three-IR design allows progressive lowering from high-level semantics to low-level machine code while preserving enough information for high-quality error messages.

The result is a compiler that’s fast, correct, and maintainable - just like the language it compiles.

Want to contribute to the compiler? Check out the GitHub repository and read the CONTRIBUTING.md guide.