Skip to content

Tools and Integrations

Generated: December 12, 2025 Purpose: Built-in tools and OS/HF integrations with runnable commands, inputs/outputs, and platform notes Status: Implemented (some features are gated/experimental and noted below)

Web tools (search, crawl, datasets)

  • Source: src/aios/tools/web.py, src/aios/tools/crawler.py
  • What you get:
    • DuckDuckGo HTML-only search with ad/redirect filtering (ddg_search)
    • Robust HTML parsing and link extraction
    • Polite crawler with robots.txt, BFS, per-origin throttling, optional Playwright rendering and Trafilatura extraction
    • Turn searches into datasets via CLI builders (images, text, videos, websites)

Configuration knobs

  • Env vars (affect search/crawl):
    • AIOS_DDG_KL, AIOS_DDG_KAD → locale/region for DDG (defaults from config/default.yaml:web.ddg_params)
    • AIOS_WEB_UA / AIOS_WEB_UA_SUFFIX → user agent override/suffix
  • YAML: config/default.yaml:web mirrors the above defaults and can be edited.

Crawl a URL → JSON summary or dataset

  • Command (Windows PowerShell):
    • One page fetch/parse
      • aios crawl https://example.com --ttl-sec 3600 --progress
    • Recursive BFS within the same domain, rate-limited
      • aios crawl https://example.com --recursive --max-pages 25 --max-depth 2 --rps 2 --progress
    • Store pages as a text dataset JSONL under datasets pool
      • aios crawl https://example.com --recursive --store-dataset web_crawl/example --overwrite --progress
  • Inputs/flags (subset):
    • --ttl-sec cache TTL seconds; --render Playwright render; --trafilatura article extraction; --rps or --delay-ms throttling
    • --same-domain/--any-domain, --max-pages, --max-depth, --progress
    • --store-dataset NAME outputs to datasets pool at NAME/data.jsonl; --overwrite to reset
  • Outputs:
    • Progress: JSONL lines on stdout when --progress is set, e.g. {"event":"page","n":1,...}
    • Final JSON: pages summary, count, total_chars, and dataset_path/wrote_bytes when --store-dataset is used
  • Notes:
    • Respect robots.txt by default; pass --no-robots for tests only
    • Playwright requires browser install; on Windows run once: playwright install

Build datasets from the web

  • Images
    • Command:
      • aios datasets-build-images "boats" --store-dataset boats_v1 --max-images 200 --per-site 40 --pages-per-site 8 --search-results 10 --rps 2 --progress
    • Inputs: --allow-ext jpg,png,webp, --near-duplicate-threshold 8, --file-prefix boats
    • Outputs: artifacts path: datasets/images/boats_v1 with image files + manifest.jsonl (path,label,source_url,page_url,title,alt)
  • Text
    • Command:
      • aios datasets-build-text "boats" --store-dataset boats_text_v1 --max-docs 100 --search-results 10 --min-chars 400 --progress
    • Inputs: --allow-ext txt,pdf,docx (if set, fetches documents by extension/content-type); --file-prefix
    • Outputs: datasets/text/boats_text_v1/*.txt + manifest.jsonl (path,label,url,title,chars,excerpt)
  • Videos
    • Command:
      • aios datasets-build-videos "boats" --store-dataset boats_vid_v1 --max-videos 25 --per-site 5 --min-bytes 50000 --progress
    • Inputs: --allow-ext mp4,webm,mov,m4v, --file-prefix
    • Outputs: datasets/videos/boats_vid_v1/.mp4|.webm|… + manifest.jsonl (path,label,source_url,page_url,bytes)
  • Websites (HTML snapshots)
    • Command:
      • aios datasets-build-websites "boats" --store-dataset boats_sites_v1 --max-pages 30 --per-site 10 --search-results 10 --progress
    • Outputs: datasets/websites/boats_sites_v1/pages/*.html + manifest.jsonl (path,url,title,bytes,links)

Related docs: see docs/guide/CORE_TRAINING.md for using JSONL datasets; docs/guide/DATASETS.md for dataset pool and storage caps.

Filesystem and OS tools (guarded)

  • Source: src/aios/tools/fs.py, src/aios/tools/os.py
  • What you get:
    • write_text(path, data, cfg, conn) writes a file with WriteGuard and SafetyBudget enforcement
    • get_system_info() returns basic platform info
  • Guard/budget behavior:
    • Guards read allow/deny from config; budgets use DB-backed usage with tier defaults from aios.core.budgets
    • Domains charged: file_writes
  • Example usage via CLI budgets helpers:
    • Show guard paths: aios guards-show
    • Simulate a service change budget decision: aios service-restart ssh --dry-run

Root-helper and service adapters

  • Source: src/aios/tools/root_helper_client.py, src/aios/tools/service_adapter.py
  • What you get:
    • Optional privileged D-Bus client (Linux only). On Windows/macOS, gracefully returns via: "unavailable".
    • Read-only service diagnostics via local systemctl/journalctl fallback when root-helper is not available.
  • Read status and logs for a unit
    • aios status --recent 1 --unit ssh
    • For targeted triage, prefer Agent CLI operators below
  • CLI operators for triage (store artifacts):
    • aios op-run journal_summary_from_text --unit ssh --lines 200 --label ssh
    • aios op-run journal_trend_from_text --unit ssh --lines 200 --label ssh --buckets 12
    • Artifacts saved in DB (see Core CLI status for recent artifacts).
  • Notes:
    • On Linux, providing a running root-helper yields via: "root-helper" in outputs; otherwise via: "local".
    • On Windows, these commands return via: "unavailable" (no systemd).

Journal parser utilities

  • Source: src/aios/tools/journal_parser.py
  • Functions:
    • severity_counts(text) -> Dict[str,int] heuristic severity tallies across emerg…debug
  • How it’s used:
    • The Agent CLI operators compute summaries/trends and persist to DB artifacts. See: aios op-run ... above.

Package/service simulators (budgeted)

  • Source: src/aios/tools/service.py, src/aios/tools/pkg.py, src/aios/tools/privileged.py
  • What you get:
    • restart_service(name, simulate=True) and pkg.install/remove(name, simulate=True) record budget usage for service_changes/pkg_ops
    • run_privileged(fn, ...) wraps a function and charges privileged_calls budget
  • Try it:
    • aios service-restart docker --dry-run
    • aios pkg-install git --dry-run

MCP and external tools (GUI)

  • Source: GUI MCP Manager panel under src/aios/gui/components/mcp_manager_panel/*
  • What you get:
    • Visual editor for MCP servers and tool permissions using config/mcp_servers.json and config/tool_permissions.json
    • Enable/disable servers and toggle tool permissions; refresh state from disk
  • Status:
    • GUI available. Programmatic MCP wiring is scoped to UI; CLI equivalents are not exposed yet.
    • If config files are missing, the panel initializes with defaults and saves back on change.
    • Panel screenshots in GUI doc: see docs/guide/features/GUI_FEATURES.md (MCP & Tools tab).

Unlimiformer (planned)

  • Source: src/aios/integrations/unlimiformer/__init__.py
  • Status: Phase 1 scaffolding; disabled by default via config
  • Config key (example):
    • config.default.yaml → brains.trainer_overrides.unlimiformer.enabled: false
  • Notes:
    • When enabled in future phases, the model will be augmented for long-context eval using FAISS; Windows defaults to CPU FAISS.

Quick reference (commands)

  • Crawling
    • aios crawl --recursive --max-pages 25 --max-depth 2 --rps 2 --progress
  • Datasets builders
    • aios datasets-build-images "topic" --store-dataset name --max-images 200 --progress
    • aios datasets-build-text "topic" --store-dataset name --max-docs 100 --progress
    • aios datasets-build-videos "topic" --store-dataset name --max-videos 50 --progress
    • aios datasets-build-websites "topic" --store-dataset name --max-pages 30 --progress
  • Budgets and guards
    • aios guards-show
    • aios service-restart ssh --dry-run
    • aios pkg-install git --dry-run

Related: Datasets, Advanced Features

Back to Feature Index: COMPLETE_FEATURE_INDEX.md • Back to Guide Index: ../INDEX.MD