Preset Section
Quick architecture presets with pre-configured parameters.
Select Custom for full control over all parameters.
1M Preset
~1M parameters: Tiny model (hidden=256, 2+2 layers)
Fast training, minimal VRAM (~0.5 GB)
5M Preset
~5M parameters: Small model (hidden=512, 2+2 layers)
Good for testing and quick experiments (~1.5 GB)
10M Preset
~10M parameters: Medium model (hidden=768, 2+2 layers)
Balanced size/performance (~2.5 GB)
20M Preset
~20M parameters: Large model (hidden=1024, 2+2 layers)
Good quality, moderate VRAM (~4 GB)
50M Preset
~50M parameters: Very large (hidden=1536, 2+2 layers)
High quality, needs more VRAM (~7 GB)
Custom Preset
Custom architecture: Configure all parameters manually.
Reveals advanced options for hidden size, layers, heads, etc.
Brain Name Field
Unique name for this brain/model.
Will be saved to: artifacts/brains/actv1/{name}/
Use descriptive names like: large_context_v1, fast_inference, etc.
Custom Architecture Fields
Hidden Size
Model width / embedding dimension.
Larger = more expressive but more VRAM.
Must be divisible by num_heads.
Examples: 256, 512, 768, 1024, 1536, 2048
H Layers
Number of Hierarchical reasoning layers.
Higher-level abstract processing.
More layers = deeper reasoning but slower.
Typical: 2-8 layers
L Layers
Number of Local processing layers.
Lower-level detail processing.
More layers = better detail but slower.
Typical: 2-8 layers
Num Heads
Number of attention heads per layer.
More heads = more parallel attention patterns.
Must evenly divide hidden_size.
Examples: 4, 8, 12, 16, 24, 32
Expansion
Feed-forward network expansion factor.
FFN size = hidden_size × expansion.
Higher = more capacity but more VRAM.
Typical: 2.0-4.0
H Cycles
Number of processing cycles per H layer.
More cycles = more refinement per layer.
Typical: 1-3 cycles
L Cycles
Number of processing cycles per L layer.
More cycles = more refinement per layer.
Typical: 1-3 cycles
Position Encoding
Position encoding method:
• rope (Rotary): Best for long contexts,
relative positions, no learned params.
RECOMMENDED for most use cases.
• learned: Absolute positions,
trained embeddings, fixed max length.
Visual Map
┌──────────────────────────────────────────────────────┐
│ Create New HRM Student │
├──────────────────────────────────────────────────────┤
│ │
│ Choose architecture preset: ← "Quick presets..." │
│ ○ 1M ← "~1M params: Tiny model..." │
│ ○ 5M ← "~5M params: Small model..." │
│ ○ 10M ← "~10M params: Medium model..." │
│ ○ 20M ← "~20M params: Large model..." │
│ ○ 50M ← "~50M params: Very large..." │
│ ● Custom ← "Custom architecture: Configure..." │
│ │
│ Brain name: [new_brain] ← "Unique name..." │
│ │
│ ┌─ Custom Architecture ─────────────────────────┐ │
│ │ │ │
│ │ Hidden size: [512] ← "Model width..." │ │
│ │ │ │
│ │ H layers: [2] ← "Hierarchical..." │ │
│ │ │ │
│ │ L layers: [2] ← "Local processing..." │ │
│ │ │ │
│ │ Num heads: [8] ← "Attention heads..." │ │
│ │ │ │
│ │ Expansion: [2.0] ← "FFN expansion..." │ │
│ │ │ │
│ │ H cycles: [2] ← "Processing cycles..." │ │
│ │ │ │
│ │ L cycles: [2] ← "Processing cycles..." │ │
│ │ │ │
│ │ Pos encoding: [rope▼] ← "rope/learned/sincos"│ │
│ │ │ │
│ │ Note: DeepSpeed ZeRO can be selected in │ │
│ │ the main training panel │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ [Create] [Cancel] │
└──────────────────────────────────────────────────────┘
Hover Behavior
- Tooltips appear after 0.5 second hover delay
- Tooltips stay visible while hovering
- Tooltips disappear when mouse moves away
- Multi-line tooltips are properly formatted
- All interactive elements have tooltips