What Is the Memory Tree
The Memory Tree is OpenHuman's core data architecture. It is not a simple vector database wrapper. Instead, it is a deterministic pipeline that organizes your personal data into a hierarchical structure the AI can reason over efficiently.
How Data Flows In
Data from connected accounts follows a specific pipeline before becoming part of the Memory Tree.
- Fetch: OAuth-authorized connectors pull data from your accounts (email, calendar, repos, etc.) every ~20 minutes.
- Canonicalize: raw data converts to Markdown format for consistency.
- Chunk: content splits into scored segments of up to 3,000 tokens.
- Summarize: chunks fold into hierarchical summaries organized by theme, entity, and time.
- Store: final output writes to local SQLite and an Obsidian-compatible Markdown vault.
The Three Layers
The Memory Tree organizes data across three conceptual layers, each serving a different reasoning purpose.
- Themes: high-level categories like work, family, finance, health. The AI uses themes to understand the broad context of a query.
- Entities: specific people, companies, repositories, projects. The AI uses entities to narrow down relevant context.
- Documents: individual emails, calendar events, notes, transactions. The AI uses documents for precise fact retrieval when themes and entities are insufficient.
Storage and Format
One of the most powerful features of the Memory Tree is that your data lives on your machine in formats you can inspect and edit.
- SQLite database: stores the structured index, entity relationships, and chunk metadata for fast querying.
- Obsidian-compatible vault: stores the actual Markdown content in a folder structure you can open with any Markdown editor.
- The vault path is configurable in config.toml.
- Data never leaves your machine unless you explicitly enable cloud model routing for specific tasks.
Inspecting and Editing Memory
Unlike black-box cloud AI systems, OpenHuman's memory is fully transparent.
- Open the vault folder in Obsidian, VS Code, or any text editor.
- Read individual chunks to see exactly what the AI knows about a topic.
- Delete chunks or entire folders to remove data from the assistant's context.
- Add your own Markdown notes directly to the vault — the AI picks them up on the next ingest cycle.
- Edit existing chunks to correct misinterpretations or update information.
Sync and Update Cycle
The Memory Tree updates itself automatically in the background.
- Active connectors sync approximately every 20 minutes.
- New data is canonicalized, chunked, scored, and folded into the tree automatically.
- Deleted items at the source are removed from the tree on the next sync cycle.
- You can force a manual sync from the OpenHuman settings panel.
- The ingest cycle is CPU-intensive for large data volumes — local AI can help distribute the load.
Privacy Implications
The Memory Tree design has significant privacy advantages over cloud-only AI assistants.
- Local storage: your emails, calendar events, and documents stay on your machine.
- Inspectability: you can see exactly what the AI knows about you.
- Controllability: you can delete any data at any time without contacting a support team.
- Encryption: OAuth tokens are AES-256 encrypted in the local vault after the initial exchange.
- Caveat: local-first does not mean local-only. Chat, reasoning, and vision tasks may still route to cloud providers unless local AI is fully configured.
Limitations
The Memory Tree is impressive but not without constraints.
- Beta behavior: the ingest pipeline occasionally misses edge cases or mis-categorizes content.
- Storage growth: heavy email or chat users may see the vault grow to multiple gigabytes over time.
- Cleanup: there is no automatic archival or cleanup policy yet — manual pruning may be needed.
- Cross-source reasoning: while the tree structure enables it, complex queries spanning many sources can still be slow without local AI acceleration.