Engram

Optimizing Memory Architecture for Every Task

Evolve task-specific memory systems for LLM agents as executable Python code. No hand-tuning. No architecture search. Just evolution.

01 — Core Insight

Search space is code, not modules

Prior work picks among predefined architectures or tunes NL rules. Engram searches over executable Python — an open space expressing any data structure, SQL schema, or retrieval logic.

02 — Core Insight

Every task has its own optimal memory

Same system, same seeds → structurally different programs. Conversational QA evolves a multi-table SQL index. Embodied tasks evolve a deterministic action cache. No universal design wins.


Method
How Engram Works

A memory program is a Python module — dataclass schemas, write/read logic, instruction strings. The task agent is fixed. Only the memory changes.

1sampleSample Parentsoftmax on fitness scores 2evaluateEvaluatewrite obs → read queries → score 3reflectReflect & MutateLLM diagnoses → code patch 4add to poolAdd to Poolunconditional admission EVOLUTION LOOP PROGRAM POOL iter_18 — best0.486 iter_140.467 seed_10.251 KBPROGRAM INTERFACE classKnowledgeBase: defwrite(self, item, raw_text) defread(self, query) → str SQLite ChromaDB LLM (50)
System architecture — population-based evolutionary search over memory programs

Interface
What Is a Memory Program?

A complete Python module with three evolvable dimensions: instruction constants, dataclass schemas, and storage/retrieval logic.

memory_program.py
# Instruction constants — injected into agent prompts
INSTRUCTION_KNOWLEDGE_ITEM = "Extract key facts..."
INSTRUCTION_QUERY = "Formulate a search query..."
INSTRUCTION_RESPONSE = "Answer using retrieved context..."
ALWAYS_ON_KNOWLEDGE = ""

@dataclass
class KnowledgeItem:
    summary: str
    entities: list[str]
    timestamp: str

@dataclass
class Query:
    query_text: str
    entity_filter: str

class KnowledgeBase:
    def __init__(self, toolkit):
        self.db = toolkit.db
        self.chroma = toolkit.chroma

    def write(self, item, raw_text=""):
        # Store structured knowledge
        ...

    def read(self, query) -> str:
        # Retrieve relevant context
        ...
Dimension 1 — Instructions

Prompt Engineering

Four string constants steer how the task agent parses observations, formulates queries, and generates answers.

Dimension 2 — Schema

Data Structures

KnowledgeItem and Query dataclass fields define what gets stored and how it's queried. Evolution reshapes fields freely.

Dimension 3 — Logic

Storage & Retrieval

write() and read() methods use SQLite, ChromaDB, and a budget-limited LLM. Evolution rewrites the entire algorithm.


Interactive
Explore the Evolution

Click any node to view its source code. The tree shows how programs evolve through parent-child mutations. Nodes higher = earlier iterations, lower = later. Color = score.