How to Give AI Persistent Memory (Open Source): 3 Approaches Compared

The Problem

Large language models have no built-in memory between sessions. Every conversation starts from scratch. For a personal assistant that knows your schedule, preferences, and history, this is a dealbreaker. Here are three open-source approaches to solving it.

Vector databases: store conversation chunks as embeddings, retrieve relevant context for each query.
Knowledge graphs: model entities and relationships explicitly, traverse the graph for contextual answers.
Hierarchical memory trees: organize data into layered summaries, retrieve at the appropriate granularity.

Approach 1: Vector Database (RAG)

The most common approach. Text chunks are converted to embeddings and stored in a vector database. At query time, the user's prompt is embedded and the nearest chunks are retrieved as context.

How it works: chunk documents → embed with sentence-transformers → store in Chroma/PGVector/Qdrant → retrieve top-k nearest neighbors → inject into prompt.
Strengths: simple to implement, proven at scale, works with any LLM, mature tooling (LangChain, LlamaIndex).
Weaknesses: loses hierarchical structure, struggles with cross-document reasoning, retrieval quality depends heavily on chunk size and overlap, no explicit relationship modeling.
Best for: document Q&A, FAQ systems, knowledge bases where documents are independent.
Tools: Chroma, Qdrant, Weaviate, PGVector, Milvus.
Implementation complexity: low. A basic RAG pipeline can be built in under 100 lines of Python.

Approach 2: Knowledge Graph

Instead of storing chunks as vectors, extract entities and relationships and store them in a graph database. The LLM traverses the graph to find relevant context.

How it works: extract entities and relations from text → store in Neo4j or RDF store → convert queries to graph traversals → retrieve connected subgraphs → inject into prompt.
Strengths: explicit relationships (Alice works at Company X, which competes with Company Y), excellent for complex multi-hop reasoning, human-inspectable structure.
Weaknesses: entity extraction is error-prone, graph construction is expensive, query formulation requires understanding the graph schema, scaling to millions of entities is hard.
Best for: domains with rich relationships (biomedical, legal, financial), multi-hop reasoning tasks, explainable AI systems.
Tools: Neo4j, RDFlib, NetworkX, GraphRAG (Microsoft).
Implementation complexity: high. Requires domain expertise in graph modeling and entity extraction.

Approach 3: Hierarchical Memory Tree

Organize memory into layers of increasing abstraction. Raw documents at the bottom, entities in the middle, themes at the top. The LLM retrieves at the layer appropriate for the query.

How it works: fetch data from sources → canonicalize to Markdown → chunk and score → summarize into per-theme, per-entity summaries → store in hierarchical tree (SQLite + Markdown vault) → retrieve by traversing tree layers.
Strengths: natural abstraction levels, excellent cross-source reasoning ('What did Alice email me about the Q3 doc?'), inspectable and editable (open in Obsidian), efficient retrieval (themes first, then entities, then documents).
Weaknesses: more complex than simple RAG, requires careful summary quality, ingest pipeline is CPU-intensive for large data volumes.
Best for: personal assistants with many data sources, systems requiring cross-source reasoning, applications where users need to inspect and edit memory.
Tools: OpenHuman (built-in), custom implementations with SQLite + Markdown.
Implementation complexity: medium-high. The concept is straightforward but the ingest pipeline (fetch, canonicalize, chunk, summarize, store) has many moving parts.

Comparison: Which Approach Fits Your Project?

Ten dimensions compared across the three approaches.

Implementation ease: Vector DB = easy (1 day). Knowledge Graph = hard (1–2 weeks). Memory Tree = medium (3–5 days with existing tools).
Cross-source reasoning: Vector DB = poor (chunks are independent). Knowledge Graph = excellent (explicit relationships). Memory Tree = good (hierarchy enables cross-source queries).
Human inspectability: Vector DB = poor (embeddings are opaque). Knowledge Graph = good (nodes and edges are readable). Memory Tree = excellent (Markdown vault opens in any editor).
Scalability: Vector DB = excellent (billions of vectors). Knowledge Graph = medium (millions of entities, query complexity grows). Memory Tree = good (SQLite handles millions of rows, vault grows linearly).
Editability: Vector DB = hard (re-embed after changes). Knowledge Graph = medium (update nodes and edges). Memory Tree = easy (edit Markdown files, tree updates on next sync).
Retrieval speed: Vector DB = fast (~10ms). Knowledge Graph = medium (~50–200ms depending on hops). Memory Tree = fast (~10–30ms with SQLite index).
Relationship modeling: Vector DB = implicit (semantic similarity only). Knowledge Graph = explicit (defined relations). Memory Tree = semi-explicit (entity layer captures some relationships).
Best for: Vector DB = document Q&A. Knowledge Graph = complex domain reasoning. Memory Tree = personal assistants with diverse data sources.

How OpenHuman Implements the Memory Tree

OpenHuman's Memory Tree is a production example of Approach 3. Here is how it works in practice.

Fetch: OAuth-authorized connectors pull data every ~20 minutes.
Canonicalize: raw data converts to Markdown for consistency.
Chunk: content splits into scored segments of up to 3,000 tokens.
Summarize: chunks fold into hierarchical summaries organized by theme, entity, and time.
Store: output writes to local SQLite (structured index) and an Obsidian-compatible Markdown vault (human-readable content).
Retrieve: queries traverse the tree from themes → entities → documents, stopping when sufficient context is gathered.
Update: users can edit or delete any chunk in the vault. Changes are picked up on the next ingest cycle.

Getting Started

OpenHuman Memory Tree

Deep dive into OpenHuman's three-layer hierarchical memory architecture.

Open

Install OpenHuman

Set up OpenHuman and see the Memory Tree in action.

Open

Local AI Setup

Configure Ollama for local memory embeddings and inference.

Open

Best Local AI Assistant

Compare tools with different memory approaches.

Open

Can I combine approaches?

Yes. Many production systems use hybrid architectures: vector DB for fast semantic search, knowledge graph for relationship queries, and hierarchical summaries for high-level context. OpenHuman uses a hierarchical tree with SQLite indexing, which combines the structure of a tree with the query speed of a database.

Which approach is fastest to implement?

Vector database RAG is the fastest — a basic pipeline can be built in under 100 lines of Python using LangChain and Chroma. Hierarchical memory trees take 3–5 days with existing tools like OpenHuman's built-in system. Knowledge graphs require the most setup (1–2 weeks) due to entity extraction and schema design.

Do I need a GPU for any of these approaches?

For embeddings: no. CPU embedding models (all-MiniLM, BGE-small) are fast enough for most use cases. For retrieval: no — vector search and graph traversal are CPU-bound. For generation: only if you run the LLM locally. Cloud LLMs handle generation without local GPU.

How much data can each approach handle?

Vector databases scale to billions of vectors. Knowledge graphs typically handle millions of entities before query complexity becomes problematic. Hierarchical memory trees scale well to millions of documents — OpenHuman's SQLite backend handles this comfortably, though the vault can grow to multiple gigabytes.

Is the Memory Tree approach patented?

No. OpenHuman's Memory Tree is open-source under GPL-3.0. The concept of hierarchical summarization is well-established in information retrieval research. You can implement a similar system independently or use OpenHuman's built-in implementation.