Tools and Integrations¶
Generated: December 12, 2025 Purpose: Built-in tools and OS/HF integrations with runnable commands, inputs/outputs, and platform notes Status: Implemented (some features are gated/experimental and noted below)
Web tools (search, crawl, datasets)¶
- Source:
src/aios/tools/web.py,src/aios/tools/crawler.py - What you get:
- DuckDuckGo HTML-only search with ad/redirect filtering (
ddg_search) - Robust HTML parsing and link extraction
- Polite crawler with robots.txt, BFS, per-origin throttling, optional Playwright rendering and Trafilatura extraction
- Turn searches into datasets via CLI builders (images, text, videos, websites)
- DuckDuckGo HTML-only search with ad/redirect filtering (
Configuration knobs¶
- Env vars (affect search/crawl):
AIOS_DDG_KL,AIOS_DDG_KAD→ locale/region for DDG (defaults fromconfig/default.yaml:web.ddg_params)AIOS_WEB_UA/AIOS_WEB_UA_SUFFIX→ user agent override/suffix
- YAML:
config/default.yaml:webmirrors the above defaults and can be edited.
Crawl a URL → JSON summary or dataset¶
- Command (Windows PowerShell):
- One page fetch/parse
- aios crawl https://example.com --ttl-sec 3600 --progress
- Recursive BFS within the same domain, rate-limited
- aios crawl https://example.com --recursive --max-pages 25 --max-depth 2 --rps 2 --progress
- Store pages as a text dataset JSONL under datasets pool
- aios crawl https://example.com --recursive --store-dataset web_crawl/example --overwrite --progress
- One page fetch/parse
- Inputs/flags (subset):
--ttl-seccache TTL seconds;--renderPlaywright render;--trafilaturaarticle extraction;--rpsor--delay-msthrottling--same-domain/--any-domain,--max-pages,--max-depth,--progress--store-dataset NAMEoutputs to datasets pool at NAME/data.jsonl;--overwriteto reset
- Outputs:
- Progress: JSONL lines on stdout when
--progressis set, e.g. {"event":"page","n":1,...} - Final JSON: pages summary, count, total_chars, and dataset_path/wrote_bytes when
--store-datasetis used
- Progress: JSONL lines on stdout when
- Notes:
- Respect robots.txt by default; pass
--no-robotsfor tests only - Playwright requires browser install; on Windows run once: playwright install
- Respect robots.txt by default; pass
Build datasets from the web¶
- Images
- Command:
- aios datasets-build-images "boats" --store-dataset boats_v1 --max-images 200 --per-site 40 --pages-per-site 8 --search-results 10 --rps 2 --progress
- Inputs:
--allow-ext jpg,png,webp,--near-duplicate-threshold 8,--file-prefix boats - Outputs:
artifacts path: datasets/images/boats_v1 with image files + manifest.jsonl (path,label,source_url,page_url,title,alt)
- Command:
- Text
- Command:
- aios datasets-build-text "boats" --store-dataset boats_text_v1 --max-docs 100 --search-results 10 --min-chars 400 --progress
- Inputs:
--allow-ext txt,pdf,docx(if set, fetches documents by extension/content-type);--file-prefix - Outputs: datasets/text/boats_text_v1/*.txt + manifest.jsonl (path,label,url,title,chars,excerpt)
- Command:
- Videos
- Command:
- aios datasets-build-videos "boats" --store-dataset boats_vid_v1 --max-videos 25 --per-site 5 --min-bytes 50000 --progress
- Inputs:
--allow-ext mp4,webm,mov,m4v,--file-prefix - Outputs: datasets/videos/boats_vid_v1/.mp4|.webm|… + manifest.jsonl (path,label,source_url,page_url,bytes)
- Command:
- Websites (HTML snapshots)
- Command:
- aios datasets-build-websites "boats" --store-dataset boats_sites_v1 --max-pages 30 --per-site 10 --search-results 10 --progress
- Outputs: datasets/websites/boats_sites_v1/pages/*.html + manifest.jsonl (path,url,title,bytes,links)
- Command:
Related docs: see docs/guide/CORE_TRAINING.md for using JSONL datasets; docs/guide/DATASETS.md for dataset pool and storage caps.
Filesystem and OS tools (guarded)¶
- Source:
src/aios/tools/fs.py,src/aios/tools/os.py - What you get:
write_text(path, data, cfg, conn)writes a file with WriteGuard and SafetyBudget enforcementget_system_info()returns basic platform info
- Guard/budget behavior:
- Guards read allow/deny from config; budgets use DB-backed usage with tier defaults from
aios.core.budgets - Domains charged:
file_writes
- Guards read allow/deny from config; budgets use DB-backed usage with tier defaults from
- Example usage via CLI budgets helpers:
- Show guard paths: aios guards-show
- Simulate a service change budget decision: aios service-restart ssh --dry-run
Root-helper and service adapters¶
- Source:
src/aios/tools/root_helper_client.py,src/aios/tools/service_adapter.py - What you get:
- Optional privileged D-Bus client (Linux only). On Windows/macOS, gracefully returns via: "unavailable".
- Read-only service diagnostics via local systemctl/journalctl fallback when root-helper is not available.
- Read status and logs for a unit
- aios status --recent 1 --unit ssh
- For targeted triage, prefer Agent CLI operators below
- CLI operators for triage (store artifacts):
- aios op-run journal_summary_from_text --unit ssh --lines 200 --label ssh
- aios op-run journal_trend_from_text --unit ssh --lines 200 --label ssh --buckets 12
- Artifacts saved in DB (see Core CLI status for recent artifacts).
- Notes:
- On Linux, providing a running root-helper yields via: "root-helper" in outputs; otherwise via: "local".
- On Windows, these commands return via: "unavailable" (no systemd).
Journal parser utilities¶
- Source:
src/aios/tools/journal_parser.py - Functions:
severity_counts(text) -> Dict[str,int]heuristic severity tallies across emerg…debug
- How it’s used:
- The Agent CLI operators compute summaries/trends and persist to DB artifacts. See:
aios op-run ...above.
- The Agent CLI operators compute summaries/trends and persist to DB artifacts. See:
Package/service simulators (budgeted)¶
- Source:
src/aios/tools/service.py,src/aios/tools/pkg.py,src/aios/tools/privileged.py - What you get:
restart_service(name, simulate=True)andpkg.install/remove(name, simulate=True)record budget usage for service_changes/pkg_opsrun_privileged(fn, ...)wraps a function and charges privileged_calls budget
- Try it:
- aios service-restart docker --dry-run
- aios pkg-install git --dry-run
MCP and external tools (GUI)¶
- Source: GUI MCP Manager panel under
src/aios/gui/components/mcp_manager_panel/* - What you get:
- Visual editor for MCP servers and tool permissions using
config/mcp_servers.jsonandconfig/tool_permissions.json - Enable/disable servers and toggle tool permissions; refresh state from disk
- Visual editor for MCP servers and tool permissions using
- Status:
- GUI available. Programmatic MCP wiring is scoped to UI; CLI equivalents are not exposed yet.
- If config files are missing, the panel initializes with defaults and saves back on change.
- Panel screenshots in GUI doc: see
docs/guide/features/GUI_FEATURES.md(MCP & Tools tab).
Unlimiformer (planned)¶
- Source:
src/aios/integrations/unlimiformer/__init__.py - Status: Phase 1 scaffolding; disabled by default via config
- Config key (example):
- config.default.yaml → brains.trainer_overrides.unlimiformer.enabled: false
- Notes:
- When enabled in future phases, the model will be augmented for long-context eval using FAISS; Windows defaults to CPU FAISS.
Quick reference (commands)¶
- Crawling
- aios crawl
--recursive --max-pages 25 --max-depth 2 --rps 2 --progress
- aios crawl
- Datasets builders
- aios datasets-build-images "topic" --store-dataset name --max-images 200 --progress
- aios datasets-build-text "topic" --store-dataset name --max-docs 100 --progress
- aios datasets-build-videos "topic" --store-dataset name --max-videos 50 --progress
- aios datasets-build-websites "topic" --store-dataset name --max-pages 30 --progress
- Budgets and guards
- aios guards-show
- aios service-restart ssh --dry-run
- aios pkg-install git --dry-run
Related: Datasets, Advanced Features
Back to Feature Index: COMPLETE_FEATURE_INDEX.md • Back to Guide Index: ../INDEX.MD