Architecture Diagrams¶
Visual Reference for Implementation
Created: December 8, 2025
π System Architecture Overview¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HRM-ACTV1 Enhanced Model β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Input Tokens β
β β β
β βΌ β
β βββββββββββββββ β
β β Embeddings β β
β ββββββββ¬βββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Layer 1 β β
β β ββββββββββββββββ βββββββββββββββββββ β β
β β β Attention ββββββββββββββββ€ TraceManager β β β
β β β + RoPE β bias inject β - Sparse Memory β β β
β β ββββββββ¬ββββββββ β - Salience β β β
β β β β - Decay/Update β β β
β β β ββββββββββ²βββββββββ β β
β β β β β β
β β βΌ β capture β β
β β ββββββββββββββββ β β β
β β β MoE Router ββββββββββββββββββββββββΊβ β β
β β ββββββββ¬ββββββββ log paths β β β
β β β β β β
β β βΌ β β β
β β ββββββββββββββββ ββββββββββ΄βββββββββ β β
β β β Experts β β RoutingPathTree β β β
β β β [E1..E8] β β - Suffix Tree β β β
β β ββββββββ¬ββββββββ β - Motif Detect β β β
β β β β - Crystallize β β β
β β β βββββββββββββββββββ β β
β βββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ... (Layers 2-31) β
β β β
β βΌ β
β βββββββββββββββ β
β β Output Head β β
β βββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Data Flow: Attention Trace Lifecycle¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Forward Pass (with tracing) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Compute QKV β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Load trace bias ββββββ M^(l,h) (sparse)
β from memory β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β QK^T / βd_k β
β + Ξ±Β·M β β Biased attention
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Softmax β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β AttentionΒ·V β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β [DETACH] β
β Store attn β β Temporary buffer
β weights β
βββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backward Pass (gradient capture) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β loss.backward() β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Extract βL/βA β β Attention gradients
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Compute salienceβ
β S = AΒ·|βL/βA| β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Top-k selection β β Only high salience
β (k β 0.1% edges)β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Update M^(l,h) β
β with EMA/decay β
βββββββββββββββββββ
π³ Routing Path Tree Structure¶
Root
β
ββββββββββββββββΌβββββββββββββββ
β β β
Eβ Eβ Eβ β Layer 1
count: 523 count: 892 count: 341
reward: 412 reward: 705 reward: 268
β β β
βββββββββΌββββββ ββββββΌβββββ βββββ΄ββββ
β β β β β β β β
Eβ Eβ Eβ Eβ Eβ Eβ Eβ Eβ β Layer 2
cnt:201 cnt:198 ...
β
βββββΌββββ
β β β
Eβ Eβ Eβ β Layer 3
...
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
High-Utility Path Example:
Ο = [Eβ β Eβ β Eβ β Eβ]
βββββββββββββββββββββ
Frequency: 198 (> threshold)
Utility: +0.07 (> threshold)
Entropy: 0.8 (< threshold)
β CRYSTALLIZE β
Creates new expert: E_motif42
Future routing can select E_motif42 directly,
bypassing individual expert routing.
πΎ Memory Layout: Trace Storage¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TraceMemory (Per Layer/Head) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Sparse COO Format: β
β β
β traces: List[AttentionTrace] β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β AttentionTrace #1: β β
β β layer_id: 5 (1 byte) β β
β β head_id: 12 (1 byte) β β
β β query_idx: 1024 (2 bytes) β β
β β key_idx: 512 (2 bytes) β β
β β salience: 0.85 (4 bytes float32) β β
β β age: 3 (2 bytes) β β
β β Total: 12 bytes β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β AttentionTrace #2: β β
β β ... β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β max_traces: 2048 per head β
β Total: 2048 Γ 12 bytes = 24 KB per head β
β β
β For 32 layers Γ 32 heads: β
β Total: 32 Γ 32 Γ 24 KB = 24.6 MB β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Fast Lookup via Hash Map β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β trace_index: HashMap[(layer, head, i, j) β salience] β
β β
β Example: β
β (5, 12, 1024, 512) β 0.85 β
β (5, 12, 2048, 256) β 0.72 β
β ... β
β β
β O(1) lookup for bias injection β
β O(log n) insert for updates (maintain sorted) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Training Loop Integration¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Standard Training Loop β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β for batch in dataloader: β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. Forward Pass β β
β β - Attention with trace bias β β
β β - MoE routing logged β β
β βββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββΌββββββββββββββββββββββββββββββββββββ β
β β 2. Compute Loss β β
β β L_total = L_task + Ξ²βΒ·L_balance β β
β β + Ξ²βΒ·L_trace + Ξ²βΒ·L_crystal β β
β βββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββΌββββββββββββββββββββββββββββββββββββ β
β β 3. Backward Pass β β
β β - Compute gradients β β
β β - Extract βL/βA for salience β β
β βββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββΌββββββββββββββββββββββββββββββββββββ β
β β 4. Optimizer Step β β
β β - Update model weights β β
β βββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββΌββββββββββββββββββββββββββββββββββββ β
β β 5. Periodic Trace Update β β
β β if step % UPDATE_INTERVAL == 0: β β
β β - Compute salience scores β β
β β - Update trace memory (EMA) β β
β β - Apply decay to unused traces β β
β β - Evict lowest salience if quota full β β
β βββββββββββ¬ββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββΌββββββββββββββββββββββββββββββββββββ β
β β 6. Periodic Motif Update β β
β β if step % CRYSTALLIZE_INTERVAL == 0: β β
β β - Traverse routing tree β β
β β - Compute utilities β β
β β - Detect crystallization candidates β β
β β - Freeze high-utility motifs β β
β β - Prune low-utility motifs β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Salience Computation Pipeline¶
Attention Weights (A) Gradients (βL/βA) Recurrence (p)
β β β
β [B, H, T, T] β [B, H, T, T] β [T, T]
β β β
βΌ βΌ βΌ
βββββββββββββββββ ββββββββββββββββββ βββββββββββββββ
β A_{i,j} β β |βL/βA| β β log(1 + p) β
βββββββββ¬ββββββββ ββββββββββ¬ββββββββ ββββββββ¬βββββββ
β β β
βββββββββββββββ¬ββββββββββββββββ β
β β
βΌ β
ββββββββββββββββ β
β A Β· |βL/βA|β β
βββββββββ¬βββββββ β
β β
ββββββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Salience Score β
β S = AΒ·|β|Β·log p β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Threshold β
β S > ΞΈ? β
ββββββββββ¬βββββββββ
β
ββββββββββββββββ΄βββββββββββββββ
β β
YES NO
β β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ
β Add to traces β β Apply decay β
β (EMA update) β β M β Ξ³Β·M β
ββββββββββββββββββββ ββββββββββββββββββββ
π― Crystallization Decision Tree¶
Routing Path Ο
β
βΌ
ββββββββββββββββββββ
β Frequency f(Ο) β
β > f_min? β
ββββββ¬βββββββββ¬βββββ
YES NO
β β
β ββββΊ Reject (insufficient data)
β
βΌ
ββββββββββββββββββββ
β Utility U(Ο) β
β > U_min? β
ββββββ¬βββββββββ¬βββββ
YES NO
β β
β ββββΊ Reject (not beneficial)
β
βΌ
ββββββββββββββββββββ
β Entropy H(Ο) β
β < H_max? β
ββββββ¬βββββββββ¬βββββ
YES NO
β β
β ββββΊ Reject (unstable routing)
β
βΌ
ββββββββββββββββββββ
β Age age(Ο) β
β > Ο? β
ββββββ¬βββββββββ¬βββββ
YES NO
β β
β ββββΊ Reject (temporal instability)
β
βΌ
ββββββββββββββββββββ
β Motif quota β
β available? β
ββββββ¬βββββββββ¬βββββ
YES NO
β β
β ββββΊ Evict lowest-utility motif
β (if U(Ο) > U_min(existing))
β
βΌ
βββββββββββββββββββββββ
β CRYSTALLIZE MOTIF β
β - Create new expertβ
β - Freeze pathway β
β - Register in MoE β
βββββββββββββββββββββββ
π¬ Experimental Monitoring Dashboard¶
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Training Metrics β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Loss Components: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β L_task: 2.34 β (main task loss) β β
β β L_balance: 0.05 ~ (MoE load balance) β β
β β L_trace: 0.12 β (trace utilization) β β
β β L_crystal: 0.08 β (crystallization entropy) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Trace Statistics: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Total traces: 1.2M / 2.1M (57% capacity) β β
β β Avg salience: 0.42 (healthy) β β
β β Coverage: 34% (attention ops using traces) β β
β β Decay rate: 0.98 (auto-tuned) β β
β β Memory usage: 18.3 MB / 24.6 MB β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Crystallization Statistics: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Active motifs: 87 / 512 (17% capacity) β β
β β Avg utility: +0.09 (9% improvement) β β
β β Avg entropy: 0.7 (stable) β β
β β FLOP reduction: 23% (via motif reuse) β β
β β Tree nodes: 12.4K (manageable) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Emergent Language Properties: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Max hierarchy: 3 levels deep β β
β β Composition rate: 42% (motifs calling motifs) β β
β β Task clustering: Silhouette = 0.61 (good) β β
β β Symbol efficiency: 8.2Γ compression vs tokens β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π οΈ File Structure¶
src/aios/core/hrm_models/cognitive/
β
βββ __init__.py # Module exports
β
βββ trace_manager.py # Persistent attention traces
β βββ class TraceManager
β β βββ __init__(config)
β β βββ capture_attention(layer, head, attn_weights)
β β βββ compute_salience(attn, grads, recurrence)
β β βββ update_traces(salience_scores)
β β βββ apply_decay()
β β βββ get_bias_for_layer_head(layer, head)
β β βββ to_sparse_tensor()
β β
β βββ class AttentionTrace (dataclass)
β βββ layer_id: uint8
β βββ head_id: uint8
β βββ query_idx: uint16
β βββ key_idx: uint16
β βββ salience: float32
β βββ age: uint16
β
βββ routing_tree.py # MoE path tracking
β βββ class RoutingNode
β β βββ expert_id: int
β β βββ layer: int
β β βββ count: int
β β βββ total_reward: float
β β βββ children: Dict[int, RoutingNode]
β β βββ utility() β float
β β βββ entropy() β float
β β
β βββ class RoutingPathTree
β βββ root: RoutingNode
β βββ motif_registry: Dict[str, CrystallizedMotif]
β βββ record_path(path, reward)
β βββ find_candidates()
β βββ prune_low_utility()
β
βββ crystallization.py # Motif freezing logic
β βββ class CrystallizedMotif
β β βββ path: List[int]
β β βββ frozen_experts: nn.Module
β β βββ utility: float
β β βββ frequency: int
β β βββ forward(x)
β β
β βββ class CrystallizationManager
β βββ detect_motifs(tree)
β βββ freeze_motif(path)
β βββ unfreeze_motif(motif_id)
β βββ evaluate_utility(motif)
β
βββ losses.py # Auxiliary loss functions
β βββ trace_utilization_loss(trace_manager)
β βββ crystallization_entropy_loss(routing_tree)
β βββ elastic_weight_consolidation(model, fisher_info)
β
βββ config.py # Configuration schemas
β βββ class TraceConfig (TypedDict)
β βββ class CrystallizationConfig (TypedDict)
β
βββ visualization.py # Analysis tools
βββ plot_trace_heatmap()
βββ plot_routing_sankey()
βββ plot_motif_hierarchy()
βββ export_motif_graph()
π¨ Color Coding for Visualizations¶
Trace Salience: - π¦ Low salience (0.0-0.3): Recently captured, not yet consolidated - π© Medium salience (0.3-0.7): Moderately reinforced - π¨ High salience (0.7-0.9): Strongly reinforced - π₯ Critical salience (0.9-1.0): Core reasoning pathways
Motif Utility: - β¬ Neutral (U β 0): No benefit - π¦ Low benefit (U = 0.01-0.05): Minor improvement - π© Moderate benefit (U = 0.05-0.10): Worth crystallizing - π¨ High benefit (U = 0.10-0.20): Very valuable - π₯ Critical (U > 0.20): Essential pattern
Routing Entropy: - π© Low entropy (H < 0.5): Deterministic, stable - π¨ Medium entropy (H = 0.5-1.0): Somewhat stable - π₯ High entropy (H > 1.0): Unstable, don't crystallize
Status: Visual reference complete
Usage: Print this for implementation reference
Next: Begin coding Phase 0 infrastructure