Local inference

OpenHuman Local AI

Optional and off by default. When enabled, local AI drives memory embeddings, summary-tree building, and lightweight tasks. Heavy workloads still route to cloud providers unless you configure otherwise.

OpenHuman Local AI

Practical notes for evaluating a fast-moving open-source AI assistant.

Practical, source-linked OpenHuman guidance

What Local AI Can Do

Local AI is opt-in and off by default. When enabled with Ollama or LM Studio, supported workloads run on-device instead of through the bundled cloud subscription.

  • Memory embeddings and summary-tree building run on a local model.
  • Background reasoning loops and reflective tasks stay on-device.
  • Lightweight chat and classification tasks route locally when the workload hint matches.
  • By default, heavy reasoning, agentic tasks, coding, vision, voice TTS, and web search still route to cloud providers.

Setup

Enable local AI in Settings, then choose a preset matching how much workload stays on-device.

  • Path: Settings → AI & Skills → Local AI.
  • Three presets: embeddings only; memory plus reflection; or full local where supported.
  • Override provider and base URL with local_ai.provider and local_ai.base_url in config.toml.
  • The default local model is Gemma 3 1B, which needs a few gigabytes of RAM and disk space.

Hardware and Tradeoffs

Local AI pays off when you have heavy email or chat ingestion, need offline summaries, or want privacy-sensitive background reflection. Skip it on limited hardware with only a few connected sources.

  • Worth it: heavy mail or chat ingestion, offline summaries, privacy-sensitive reflection, or an existing GPU.
  • Skip it: few connected sources, no GPU, limited RAM, or workflows locked to cloud features like vision and voice.
  • Known issue: on Windows 11 with an existing Ollama server, the UI shows the Ollama URL as read-only. Edit config.toml directly.
  • LM Studio support: check current release notes for confirmed vs. planned features.